In a move that signals the death of Hadoop and that the open-sourced software is no longer a key part of big data vendor’s strategy, two rival companies Cloudera and Hortonworks jointly announced a merger this week. They also announced a definitive agreement under which the companies will combine in an all-stock merger of equals. An official statement also revealed their roadmap to make Hadoop native to the cloud and usher in the development of next-gen data platform leader. This will notably be the industry’s first enterprise data cloud, giving ease of use and elasticity to the public cloud.
What’s most surprising about the all-stock merger is that both the companies who open source with enterprise-ready Hadoop distributions operating in a similar space, realised the writing on the wall, in the nick of time. That the rise of managed data science services from AWS, Azure and Google made Hadoop less useful and was one of the key reasons behind the merger of the two big data pioneers.
From Heavy Data Infrastructure To Cloud
Both the big data giants which provided software around Hadoop flourished at a time when most projects were heavy data infrastructure-based. This was a decade ago when analysts performed data analysis on extremely large sets of data. Now, over the last few years, the advent of object cloud storage changed the big data market exponentially and users moved away from Hadoop stalwarts. According to Wikibon Lead Analyst James Kobielus, HDFS-based data lakes typify data-at-rest architectures which are no longer a part of enterprise data strategies anymore.
So this is what happened — the world moved away to cloud adoption with data analysis services designed for the cloud era. Moreover, with the rise in public cloud object storage such as Google Cloud storage, Amazon S3, IBM Cloud Object Storage, and AWS Elastic MapReduce File System, dependency on HDFS has reduced drastically. So did the rise in object storage-as-a-service which provides a robust, scalable unstructured data store bring down the curtains on Hadoop.
Over the years, object storage has become the core platform for big data solutions and it also provided several advantages such as access to large amounts of data, programmable storage interface. Analysts believe object storage will eventually be replaced by stream computing which will become the foundation of tomorrow’s data architectures.
Hadoop Symbolised Big Data — Cloud And AI Brought Its End
While the Hadoop ecosystem symbolised big data in the early days (a decade back), the last few years have seen a massive shift in data architecture with organisations heavily investing in serverless computing to tackle shifting workloads.
- To support new database architecture, there has been a rise in other open source projects like Kafka, Elastic and Flink among others.
- Besides, Google developed an open source container-orchestration system Kubernetes is also soaring in popularity and is used to manage Google-scale workloads.
- However, a major rise in cloud computing, spanning storage, managed services and open source activity upended the Hadoop market.
Since both the companies operated in the same market — Cloudera also targeted data warehouses and Hortonworks provided solutions in edge computing and IoT, they can now work together to create “a superior unified platform and clear industry standard from the Edge to AI, substantially benefiting customers, partners and the community”.
And Tom Reilly, Cloudera, CEO conceded in a statement that two businesses were complementary and strategic. “By bringing together Hortonworks’ investments in end-to-end data management with Cloudera’s investments in data warehousing and machine learning, we will deliver the industry’s first enterprise data cloud from the Edge to AI. This vision will enable our companies to advance our shared commitment to customer success in their pursuit of digital transformation,” he said.
What Does This Signify For MapR?
The Hadoop obituary was already written in 2017 when enterprises shied away from Hadoop distribution vendors, primarily big data pioneers Cloudera and Hortonworks which rose in 2005 from the open-source Apache project. However, questions are also swirling around MapR, another company that is an offshoot of the Hadoop era. Pegged as one of the most innovative products, MapR is well-known for its open source business model and also pushing the boundaries on databases, containers, file systems.
According to a recent statement from the company, MapR was recognized for providing a data platform for AI and analytics, that enables enterprises to inject analytics into their business processes, thereby increasing revenue, reduce costs and mitigate risks that helps address the data complexities of high-scale and mission-critical distributed processing, across cloud to the edge, IoT analytics, and container persistence.
Try deep learning using MATLAB