MITB Banner

Addressing the People Problem in your Big Data Architecture

Share

A quick peek into the evolution of the data warehousing environment will reveal that a lot has changed since the 1990s. Earlier, data was collected based on usefulness and cost, since storage was expensive. Additionally, a lot of data was discarded to ensure optimized return on investment of storage. In the ‘90s, ETL (extract, transform, load) workflows meant funneling all data into relational databases, which in turn became the single source of truth for future operations. Engineers were responsible for getting the data into the databases, and eventually, to the analysts. Data was neatly manicured to fit the architecture.

However, that paradigm has flipped today. In time, as storage to compute and scale-out technologies became more and more affordable, data that didn’t seem to be of immediate importance, didn’t need to be discarded any longer. Today, as organizations gather more data from varied sources, any and all data of potential value is stored to be mined later. Big data technologies have also emerged, on the basis of “scale-out”, thus allowing data to be stored on commodity hardware, and to be processed efficiently in parallel. Currently, rather than fitting data to the architecture, the architecture fits itself to the data.

Nonetheless, despite the shift, and advancements made, in the way data is collected and consumed, only a few organizations have been successful in scaling their big data efforts.

Challenges with Big Data

Gartner predicts that 60 percent of big data projects over the next year will fail to go beyond the pilot stage and will be abandoned. This highlights two key weaknesses that most big data initiatives are typically plagued with, partly owing to how ‘big’ the data is:

  1. difficulty in identifying what data to collect, and
  2. inability to analyze the data that has been collected.

As we dive deeper into the problem, we can cite a number of reasons why big data is difficult. A lot of the struggle in handling big data has to do with the new systems and technologies that have emerged to address the need for big data. Since this innovation doesn’t seem to be slowing down, it has become exceedingly difficult for businesses (even those that embrace data) to have the vision and expertise to build and operate these platforms.

In addition, building a robust big data architecture requires piecing together a wide range of technologies, many being open source, in order to create coherent processes for serving up the data, optimally, to analysts, data scientists and data engineers. This is a large infrastructure investment, and is a common hindrance to the realization of big data initiatives.

Between the lack of expertise, large investments in infrastructure, and a constantly shifting technology landscape, many businesses get caught up in the confusion and begin to see projects flounder and fail. 

Addressing the People Problem

Well, we have already established how the sheer rapidity with which the associated technology landscape is evolving, causes the lack of qualified personnel. Since newer technologies are still maturing in the big data space, to find expertise in those fields is even more difficult.

To address that problem, requires an organization-wide change – a transformation to a data-driven culture. A data-driven organization, in my opinion, should possess three things:

  • A company-wide culture of using data to make business decision
  • An organizational structure that supports a data-driven culture
  • Technology that supports a data-driven culture, and makes data “self-service”

Of the above, I feel that creating a self-service culture, is the most important, and arguably the most difficult aspect of transitioning to a data-driven organization. This shift entails identifying and building a cultural framework that enables all the people involved in a data initiative – from the producers of the data, to those who build the models, to those who analyze it, to the employees who use it in their jobs – to collaborate on making data the heart of organizational decision-making.

To share a few “real-world” tips on building a data-driven culture, I will suggest:

  • Hire data visionaries – you need people who are open minded about what the data will tell them regarding the way forward, and understand all the ways that employees can use data to improve the business.
  • Organize your data into a single data store accessible to everyone – always allow employees to see the data that affect their work. This means eliminating data silos and effectively democratizing data access, while still preserving data security and compliance issues.
  • Empower all employees – build a culture that allows all employees to share opinions, as long as they are backed up by data, even if those opinions contradict senior executives’ assumptions. This is key to keeping businesses competitive in even the fastest-moving markets.
  • Invest in the right self-service data tools – your data, even if readily accessible, won’t help your business much if most of your employees can’t understand it, or don’t apply it to business problems. This can be solved by a) investing in the right data tools, and very importantly, b) training your employees on how to use those tools.
  • Hold employees accountable – technology will only take you so far, and hence, you also need to put incentives in place to encourage the employees to use the technology and tools. Also, you must employ ways to measure and grade progress towards a self-service data culture. This means holding employees accountable for their action and progress when they effectively use data to drive business decisions.

Creating a data-driven culture is not always easy, but the benefits it provides are real and significant. Big data is truly transforming the ways that organizations conduct business, and hence, it should come as little surprise that it has a big role to play in changing your culture, as well.

PS: The story was written using a keyboard.
Share
Picture of Joydeep Sen Sarma

Joydeep Sen Sarma

Before co-founding Qubole, Joydeep worked at Facebook where he boot-strapped the data processing ecosystem based on Hadoop, started the Apache Hive project and led Facebook’s Data Infrastructure team. Joydeep was also a key contributor on the Facebook Messages architecture team and brought the power of Apache Hbase to Facebook and to the transactional and reporting backends for Facebook Credits.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India