Today, the big data universe revolves around Hadoop, an open-source distributed platform to store and process data with high volume, velocity and variety. What started as an open source innovation has now changed the face of database technology & businesses in a short span of 10 years!
The explosion of big data in 2014 has led companies across industries such as advertising, retail, healthcare, social media, manufacturing, telecommunications and government to invest and adopt Hadoop.
Inspired by research papers published by Google, Hadoop was first seeded in October 2002 as ‘Nutch’ by Doug Cutting and Mike Cafarella. Named after a toy elephant owned by Cutting’s son, Hadoop has evolved from its stages of infantry as ‘Nutch’, to the phenomenal data analysis software that it is today.
We celebrate Hadoop’s 10th birthday today, since Doug Cutting had officially started ‘Hadoop’ as a sub-project under ‘Nutch’ on 28th January 2006 after joining ‘Yahoo!’ in January 2006.
A year later, the Apache Software Foundation took it from Yahoo! and it developed into a vast framework. Cloudera was founded in August 2008 to commercialize Hadoop and in 2009, Dough Cutting joined Cloudera as chief architect.
In all these years, The Apache Hadoop framework developed Hadoop Distributed File System (HDFS), MapReduce, Hadoop YARN and Hadoop Common as it’s core modules. Besides these, Hadoop has a vast ecosystem of technologies, such as Apache Hbase, Apache Pig, Apache Spark, etc, and the numbers and scope of such projects continue to expand.
Companies use Hadoop to analyze customer behaviour, process call center activity, process data from social media and then make decisions in real time to understand customer needs, mitigate problems and gain a competitive advantage. It helps control big data processing costs by distributing computing across commodity servers instead of using expensive, specialized servers. Typically, there are several master nodes with hundreds and thousands of worker nodes, mitigating even a single failure possibility.
Moreover, the commercial ecosystem that has grown around Hadoop, has seen an even more astounding growth. In its early days, the open source community pushed it towards commercialization, as a result, it gained so much momentum that the open source community has to pull it back.
With the growth in the self-service data and internet of things, new Hadoop business models are emerging as competition continues to catalyze the transformation of Hadoop. The self-service data exploration gives the companies the ability to explore the width and depths of any data that comes in from machine logs, data centers, social media and can be harnessed to look for new insights and business processes. The ease of continuous access and processing of data in real time has made Hadoop an indispensable part of businesses today.