According to The Global Language Monitor, ’Big Data’ is the most confusing Tech Buzzwords of the Decade thus far. According to them, –
“Big Data is the biggest buzzword. It has been called the key to new waves of productivity growth, essential to the US place in global economics, and more. Now if only we could agree on exactly what this means and how we get there. (By the way, consider yottabytes: a quadrillion gigabytes. Hint: Just think a lotta bytes.)”
The meaning of Big Data has become more diffuse as it has grown in popularity. According to Raj Bhatt of Knowledge Foundry, “The most important trend is the increasing hype/confusion around Big Data analytics. Many companies and people have their own definition of Big Data – leading to a lot of confusion about what qualifies as a Big Data solution.”
According to another study by Mzinga, 42 percent of respondents are unfamiliar with big data technologies. There remains a great deal of confusion regarding what the term Big Data really means. In this article, we try to address some of the myths and confusion around Big Data.
Lot of Data is not Big Data
Most of the people get carried away by the term Big and define big data as simply a lot of data. But it’s not just that. “We define a problem as a Big Data problem only if the size of the data, the short timeframe for a solution, and the diversity of the data necessitate a distributed NoSQL-based architecture”, says Raj.
Ask an educated audience and a plethora of definition would arise, ranging from large data sets and data warehouses to big code for analytics and BI. Some even see big data as hardware and large applications. To keep the definition simple, there has been a growing consensus in the industry to define Big Data, by three Vs-
- Volume – the amount of data has to be large, in petabytes not just gigabytes
- Velocity – the data has to be frequent, daily or even real-time
- Variety – the data is typically (but not always) unstructured (like videos, tweets, chats)
Yet, the confusion around big data continues with expansion of V’s to include veracity, viscosity, virality and even going till 16 V’s.
Hadoop is not Big Data
Over time, “Hadoop” has become synonymous with the term “big data”. A lot of people associate big data with Hadoop when it is just one element and one capability that’s required to address the big data problem. And there are various other applications that can easily substitute Hadoop.
A part reason for this confusion is because the majority of conversation around big data is driven largely by the information technology community and centering primarily on technology, as opposed to the line of business community.
Its Y2K again
There’s almost a Y2K natured fear around how Big Data would grow to be a monster and would eventually become uncontrollable by our existing technologies. Today an apocalyptic styled urgency in technology community has set in to create quick solutions around big data.
Amongst all this chaos, the main beneficiaries i.e. organizations are quite. Today the conversation around big data is more centered on ‘how we solve it’ rather than defining the problem to be solved. Organizations would continue to grapple the perplexity around big data until the disarray settle down to a more established business solution rather than technology itself.