MITB Banner

Big Data buzz is on decline: Is 2017 the year of demise for Big Data

Share

Illustration by Author and data scientist Cathy O'Neil prophesies about the overreaching power of algorithms in her book

Author and data scientist Cathy O’Neil prophesies about the overreaching power of algorithms in her book

The hype is over and so is big data, proclaimed Gartner’s 2015 Hype Cycle and while there was a lot of marketing buzz around the big data phenomenon, the demand for Hadoop specifically is on decline. But that was 2015, yet the perils are far from over. In a recent report, Gartner forecasted that in 2017, 60% of big data projects will not go beyond the piloting stage.

Even in one of our recent studies, Analytics & Data Science Leaders Outlook in India, Big Data declined as a growth area in industry outlook of Indian analytics leaders since last year. In 2016, 1 in 2 leaders saw this space to be a growth area. Today, just 1 in 3 analytics leaders in India expect ‘Big Data’ to be growth area for next 12 months.

Typically, big data is characterized by three Vs – volume, velocity and variety; though the number of V’s have increased to include veracity and even value. Statistics point out that by the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet. Another key insight is that by 2020, our accumulated digital knowledge will be around 44 zettabytes, or 44 trillion gigabytes, up from just 4.4 zettabytes today. This is good enough use case for Big data to be of tremendous value, let alone flourish.

Hadoop’s dream of unifying data comes to an end, no longer the best data management architecture

Has Hadoop failed to live up to the enterprise’s expectations

Over the years, organizations had already embrace all kinds of data, existing data, historical data, log data, social and transactional among other types. According to a Bain research, the drive to collect and mine new data sets gained ground with the rise of social media and mobile devices. Even then, many enterprises are grappling with data deluge and data silos, where they are unable to make use of existing data that cannot be easily accessed, organized, linked or interrogated.

Hadoop is the most widely-adopted open source distributed computing platform when it comes to big data management. Yet, it has not quite lived up to enterprises’ expectations of scale and be a go to authority on everything big data. So, is Hadoop irrelevant, and if so, where will all the unstructured data land up? At the recent Strata + Hadoop World, 2017 conference, reigning sentiment by experts was that Hadoop has outlived its concept of being a data hub.

Scott Gnau, CTO of Hortonworks

Even though Hadoop had been marketed as the best data warehouse, reportedly Hortonworks CTO Scott Gnau, believes it failed to deliver business value because of the inferior SQL repository and engine compared to the traditional EDW vendors, like Teradata.

Pitfalls of Hadoop technology base:

  • Hadoop is good at extract, transform and load (ETL), the SQL-handling features aren’t great.
  • Storage-centric technology is not apt for machine learning and other advanced analytics tasks
  • Streaming analytics comes in the picture, extracting information from data quickly. Big data management companies such as Cloudera, MapR and Hortonworks have already adopted streaming data pipelines in their core platform.
  • What stream processing systems do is fulfil the enterprise’s analytics tasks

Is Kafka the answer to Hadoop?

Though big data technology is by and large synonymous with Hadoop, there is a slew of open source software out there – Apache Kafka, Apache Spark and MongoDB. And according to reports, the adoption of Apache Kafka, first developed at LinkedIn is on a significant rise. What Apache Kafka provides is a central streaming platform, wherein the data streams are stored, processed, and sent on to any subscribers. Kafka works in tandem with Apache HBase, Apache Storm and Apache Spark for real-time analysis and rendering of streaming data. In fact, according to a Cloudera post, Kafka’s unique attributes make it best suited for integration.  Technologists are increasingly marketing Apache Kafka for big data applications, since Hadoop is a complicated technology stack to build on. From scalability to low latency and data partitioning, Apache Kafka has the ability to handle large number of consumers.

Big Data, the most hyped technology of 2016

Famed statistician Nate Silver says big data is now at its peak in the Hype Cycle

One of the most bandied terms that has been used interchangeably with analytics is big data. We had earlier pointed on how Big data and analytics are used in the same breath and recently have come to almost synonymise each other.

A recent survey of CIOs indicated it is the most hyped jargon in India in 2016. While overselling is seen as a necessary evil in marketing, it leads to overexposure. Big data is the most popular buzzwords in the media and the recent rise in interest can lead to unmanaged expectations. According to celebrated statistician Nate Silver, every new technology is supremely hyped to make it more mainstream, “and expectations of that tech skyrocket”.  

Eventually the hype crashes into the “trough of disillusionment”, he said reportedly, saying the hype cycle is at its peak. Silver, who’s known for his astute election predictions added that “big tech behemoths like Google and Facebook tend to have an iterative and unstubborn approach in terms of investing in technologies and ideas.”

In the process, small data has been lost in the melee, since experts believe not every task requires big amount of data which may not lend value. Smaller sample sizes of data can also reveal meaningful insights. For example, small scale studies such as product testing and car crash tests need not be skewed by large amounts of data.

Peter Wang is Chief Technology Officer & Co-Founder of Continuum Analytics

In fact, experts point out that though Hadoop wasn’t the better Enterprise Data Warehouse (EDW), it was initially marketed as such by Cloudera. Peter Wang, CTO and co-founder of Continuum Analytics, speaking at a recent conference hinted that Hadoop was used as a means to hoist data analytics aspirations, but so much innovation has happened around it, Tensorflow, Spark and Kafka, that Hadoop’s got left behind.

Lack of accountability in Big Data

According to a post in World Economic Forum, big data poses challenges in accountability as well. One of the major challenges in big data is that it can be gamed, the results can be skewed by “Google bombing,” and “spamdexing,” among other methods. More data has never been a replacement for quality data.  In fact, the recent US elections pointed to glaring gaps in data quality and political opinions harvested from social media outlets and polls are by no means an indication of the result.

Silver reportedly said that while working with big data sets, one manages to get major things right that improves the accuracy. However, in big data environment, one can’t test the data on real-world customers.

Dark side of big data

According to writer of Weapons of Math Destruction and data scientist Cathy O’Neil, the mathematical model has invaded all aspects of our life, from health insurances to loans and even evaluation. It is high time the mathematical modellers start taking responsibility.   

In algorithms the world believes: With algorithms ruling the universe, there is no way for people to challenge the specific results dished out by machine learning algorithms. According to a recent development, European Union has put in place measures wherein people who believe they have been affected by algorithms have a “right to an explanation”.

Share
Picture of Richa Bhatia

Richa Bhatia

Richa Bhatia is a seasoned journalist with six-years experience in reportage and news coverage and has had stints at Times of India and The Indian Express. She is an avid reader, mum to a feisty two-year-old and loves writing about the next-gen technology that is shaping our world.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.