One would almost be forgiven to think that Data Engineering is a relatively new ‘buzzword’, spawned almost at the same time as a Data Scientist. While Data Engineering is definitely a byproduct of the data science discipline, it is nowhere as new as Data Scientist – a role that emerged decades back when data solutions and advanced analytics was applied to tackle the industry’s most challenging use case.
Data Engineering Is Not A New Phenomenon
According to Derek Comingore, data engineers of today had been operating under a different nomenclature – Database Administrator, Database Developer, Data Architect and BI Developer who worked in an OLAP/enterprise data warehouse environment. From configuration to database management systems operations, these data professionals were responsible for the development and maintenance of enterprise data warehouses, ETL workflows and reporting. In companies where a data infrastructure did not exist, the data engineering role required setting up platforms such as Spark, Hadoop, Hive among others.
The first wave of data engineers worked on Apache Hadoop and conducted data wrangling jobs in leading tech companies such as Yahoo, Google and Facebook. Comingore cites how by 2010, big companies rapidly adopted Hadoop, pivoting data engineering from niche to mainstream. This is what led to the rise of modern day data engineer in enterprises today and also drove a division of the two roles – a) one who could work on the data processing system (clean & organize datasets); b) someone who could mine the datasets for patterns and insights.
For most enterprises, it wasn’t just enough to collect, collate, clean and display data about what had happened in the past. This in turn brought about the rise of predictive analytics, advanced analytics and led to the birth of the sexiest job of the 21st century — data scientist. Today, the two job roles in the data science field complement each other.
Companies Turn Data Fabulous, Give Rise To Data Engineering
So, what pushed Data Engineering into prominence today? Besides the mainstream adoption of Hadoop, the resurgence of Data Engineering can also be attributed to the rise of new-age tech companies such as LinkedIn, Airbnb, Netflix, Spotify, Uber and in India Flipkart, Ola, InMobi, Paytm, BigBasket among others that are at the forefront of developing cutting-edge data-driven products.
Today these enterprises are plowing more money in their data processing systems with the aim of uncovering insights from petabytes of data that would give them an edge over the competitors. According to Airbnb engineer Maxime Beauchemin, “The need for more complex, code-based ETL and changing data modeling drove the demand for data engineering.” The job role transcended from simply handling large scale data processing and preparing data for analysis to adapting the new technology to handle both big and streaming data. In a way, just as a Data Scientist is crucial to drive business strategy, a data engineer is required for the data preparation and making data ready for analysis. In other words, the two job roles are interdependent and this explains why a 2016 survey attributed cleaning data the most time-consuming jobs and cited why companies should free up their data science team to enable them to spend 79 per cent more time on analysis. In other words, data engineers are crucial asset to get more value out of data.
How Has A Data Engineer’s Job Role Evolved Over The Years
As the need for data infrastructure team grows, the data engineering role has grown in scope. The focus has shifted from maintaining reports and dashboards to Devops, data warehousing and working on infrastructure tools. According to a new report on State of Data Engineering, here’s what is driving the demand for data engineering talent.
- Data engineering today bridges the gap between software engineering and data science by developing production code that allows data science to scale effectively.
- Another trend driving the demand for data engineering is the push towards machine learning. The survey reveals how access to proprietary data has become a major advantage for most companies and making this data available has become a strategic function.
- Internet tech companies popularized the title of Data Engineers who now work with enterprise-size data and their skillset has expanded to include tasks such as data discovery and even developing code rather than just writing SQL queries. Interestingly, this is a different skill-set, from one needed in a schema-on-write environment.”
Why Software Engineers Can Transfer to This Role Effectively
And even though a lot aspects of the job role have changed, some functions are still exactly the same. SQL remains the language of choice for data analysis since the last four decades while the ETL tools have undergone change. The nature of data has changed significantly from tables at rest to streaming data and the modern data engineers have adopted new big data processing technology such as Spark, Kafka and Airflow among others. Today, essentially the role entails designing and building data pipelines (streaming and batch-oriented) and making the information available via APIs or Dashboards.
And that’s why software engineers are best fit for data engineering roles. According to Asim Jalis, Principal Data Engineer, Galvanize, with a steep talent shortage of data engineers, companies will look to software developers to step into the role. Jalis believes software developers are proficient coders and developers who can integrate machine learning algorithms into their applications and handle big data architecture can effectively transfer into the new role.
Future of Data Engineering
In a recent survey by Stitch Data, 50% of the world’s data engineers reside in the US. India ranked second with 11.96 per cent of data engineering talent in the country. With the rise of new-age tech companies, data engineering has grown in size and visibility since enterprises know the real value of data can be realized with a robust data infrastructure and architecture. Even though data engineering has been keeping businesses alive for the last 40 years, the current data explosion has necessitated a shift in skills requiring a new breed of talent that can connect siloed data and manage architecture and algorithms with ease. No wonder enterprises today are scrambling to find the best talent in this hot and buzzing field.
Try deep learning using MATLAB