Until a few years ago, only a handful of us had heard of data science, which is all the rage now. According to GreenBookBlog, Danish computer scientist Peter Naur is credited for coining the term data science. Over the years, data science has evolved to become more pervasive, and we are now witnessing newbie business analysts and students faced with the prospect of building their own analysis software and methodologies. Another key trend that has been observed is how the phenomenal growth of data has pushed traditional statistical techniques to the fringes and how deep learning has revolutionised computer science.
But many seem to have forgotten the basic definition of a data scientist. American mathematician and computer scientist DJ Patil defined it simply: “A data scientist is that unique blend of skills that can both unlock the insights of data and tell a fantastic story via the data.”
But the frenzy and excitement around data science has become so intense that working professionals are rushing in to learn machine learning, computer vision, text mining. On the other hand, they are getting panned for missing key statistical concepts such as distribution or confidence that form the basis of data science.
How The Definition And Role Of Data Scientist Evolved
Jennifer Priestley, Associate Dean of The Graduate College and Professor of Statistics and Data Science at Kennesaw State University categorised organisations as digital natives and non-natives. For example, for big tech firms like Amazon, Facebook, Airbnb and Google, data is the foundation. Most of the computational and analytical innovations come out of big tech companies. These companies have a deep bench of data science talent, and this structure works well because here, one can attract and recruit data science teams. Currently, there is a central data science team in most organisations, but analysts believe that as time progresses, each business unit will have dedicated data science member. Companies who have modelled themselves as data science organisations are Netflix and Airbnb.
Statistics And Data Science Are Intertwined: As the field matures, the role of data scientist will evolve. One of the definitions being bandied around is that data scientists are expert in statistics. But it may not be the case with the current lot which has gravitated from the engineering field. We have often heard that data science can’t be more than statistics. Sean Owen, director of Data Science at Cloudera noted that statistics and numerical computing have been connected for decades, and as in all areas of computing, we always crave ways to analyse a little more data. According to John Tukey’s paper The Future of Data Analysis, statistics must become concerned with the handling and processing of data, its size, and visualisation. However, today a lot of people from diverse background, even economics, claim to be data scientists.
Challenges Involved With Working Noisy Datasets: The current focus of organisations is on utilising big data which builds up analysis solutions to fulfil customer objectives. But the actual substance of what data scientists do, remains ambiguous. For example, a research cites that increasingly, scientists are faced with the challenge of working with large, heterogeneous, and noisy datasets. Most new entrants have little or no experience with cutting-edge data science techniques and technologies. These individuals have to find opportunities to bridge beyond their current skill set and current disciplinary approaches.
Companies Transition From Data-Poor To Data-Rich: As companies transition from data-poor organisation to data-rich, wide experience and a thorough background in both data science and pure sciences will be required. With institutes rushing to bridge the gap and aligning curriculum with current industry demand, the supply gap will gradually decrease. However, as people in their late 20s, 30s and even 40s look to pivot towards a career in data science, they should significantly beef up on critical, applied learning and get real hands-on experience. One cannot become a data analyst with just one analytics track or online certification, one needs to beef up strong applied statistics program. Hands-on experience can go a long way in clearing the most difficult concepts of data science.
The Last Word
According to a Gartner research, by 2020 more than 40 percent of data science tasks will be automated. For example, data preparation, for the most part, will be automated. In fact, the research also laid out a few defined instances where some of the data science-related tasks could become completely automated — automated selection and tuning. The tasks that will become the core skill set in the future are
- Feature engineering and model validation
- Understanding of the domain
- Machine learning
Eventually, there will be more focus on parallel and distributed code as professionals who were reliant on spreadsheet analysis will shift to Python and R.