Last updated October 25, 2017
In AI Origins & Evolution

Addressing the People Problem in your Big Data Architecture

Published on October 25, 2017

by Joydeep Sen Sarma

A quick peek into the evolution of the data warehousing environment will reveal that a lot has changed since the 1990s. Earlier, data was collected based on usefulness and cost, since storage was expensive. Additionally, a lot of data was discarded to ensure optimized return on investment of storage. In the ‘90s, ETL (extract, transform, load) workflows meant funneling all data into relational databases, which in turn became the single source of truth for future operations. Engineers were responsible for getting the data into the databases, and eventually, to the analysts. Data was neatly manicured to fit the architecture.

However, that paradigm has flipped today. In time, as storage to compute and scale-out technologies became more and more affordable, data that didn’t seem to be of immediate importance, didn’t need to be discarded any longer. Today, as organizations gather more data from varied sources, any and all data of potential value is stored to be mined later. Big data technologies have also emerged, on the basis of “scale-out”, thus allowing data to be stored on commodity hardware, and to be processed efficiently in parallel. Currently, rather than fitting data to the architecture, the architecture fits itself to the data.

Nonetheless, despite the shift, and advancements made, in the way data is collected and consumed, only a few organizations have been successful in scaling their big data efforts.

Challenges with Big Data

Gartner predicts that 60 percent of big data projects over the next year will fail to go beyond the pilot stage and will be abandoned. This highlights two key weaknesses that most big data initiatives are typically plagued with, partly owing to how ‘big’ the data is:

difficulty in identifying what data to collect, and
inability to analyze the data that has been collected.

As we dive deeper into the problem, we can cite a number of reasons why big data is difficult. A lot of the struggle in handling big data has to do with the new systems and technologies that have emerged to address the need for big data. Since this innovation doesn’t seem to be slowing down, it has become exceedingly difficult for businesses (even those that embrace data) to have the vision and expertise to build and operate these platforms.

In addition, building a robust big data architecture requires piecing together a wide range of technologies, many being open source, in order to create coherent processes for serving up the data, optimally, to analysts, data scientists and data engineers. This is a large infrastructure investment, and is a common hindrance to the realization of big data initiatives.

Between the lack of expertise, large investments in infrastructure, and a constantly shifting technology landscape, many businesses get caught up in the confusion and begin to see projects flounder and fail.

Addressing the People Problem

Well, we have already established how the sheer rapidity with which the associated technology landscape is evolving, causes the lack of qualified personnel. Since newer technologies are still maturing in the big data space, to find expertise in those fields is even more difficult.

To address that problem, requires an organization-wide change – a transformation to a data-driven culture. A data-driven organization, in my opinion, should possess three things:

A company-wide culture of using data to make business decision
An organizational structure that supports a data-driven culture
Technology that supports a data-driven culture, and makes data “self-service”

Of the above, I feel that creating a self-service culture, is the most important, and arguably the most difficult aspect of transitioning to a data-driven organization. This shift entails identifying and building a cultural framework that enables all the people involved in a data initiative – from the producers of the data, to those who build the models, to those who analyze it, to the employees who use it in their jobs – to collaborate on making data the heart of organizational decision-making.

To share a few “real-world” tips on building a data-driven culture, I will suggest:

Hire data visionaries – you need people who are open minded about what the data will tell them regarding the way forward, and understand all the ways that employees can use data to improve the business.
Organize your data into a single data store accessible to everyone – always allow employees to see the data that affect their work. This means eliminating data silos and effectively democratizing data access, while still preserving data security and compliance issues.
Empower all employees – build a culture that allows all employees to share opinions, as long as they are backed up by data, even if those opinions contradict senior executives’ assumptions. This is key to keeping businesses competitive in even the fastest-moving markets.
Invest in the right self-service data tools – your data, even if readily accessible, won’t help your business much if most of your employees can’t understand it, or don’t apply it to business problems. This can be solved by a) investing in the right data tools, and very importantly, b) training your employees on how to use those tools.
Hold employees accountable – technology will only take you so far, and hence, you also need to put incentives in place to encourage the employees to use the technology and tools. Also, you must employ ways to measure and grade progress towards a self-service data culture. This means holding employees accountable for their action and progress when they effectively use data to drive business decisions.

Creating a data-driven culture is not always easy, but the benefits it provides are real and significant. Big data is truly transforming the ways that organizations conduct business, and hence, it should come as little surprise that it has a big role to play in changing your culture, as well.

PS: The story was written using a keyboard.

Access all our open Survey & Awards Nomination forms in one place

Joydeep Sen Sarma

Before co-founding Qubole, Joydeep worked at Facebook where he boot-strapped the data processing ecosystem based on Hadoop, started the Apache Hive project and led Facebook’s Data Infrastructure team. Joydeep was also a key contributor on the Facebook Messages architecture team and brought the power of Apache Hbase to Facebook and to the transactional and reporting backends for Facebook Credits.

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

The Impact of Lok Sabha Election on India’s AI Progress

Vidyashree Srinivas

The BJP aims to safeguard citizen safety and privacy, leaning towards regulation, while the Congress views AI advancements as an opportunity to create jobs.