As the software industry moved from the waterfall development methodology to agile, the role of research scientists (known as data scientist today) also evolved in the last fifteen years. Their work was limited to the design phase of the analytical solution development cycle in the former model. It involved understanding the patterns in the data and using them to design algorithm for solving business problems. Then, the data sources were limited in their scope and size. The exploratory data analysis was a relatively simple task and in most cases, was done in isolation without the direct involvement of business users.
With the explosion in data, the complexity of understanding the patterns in the data has increased significantly. The data sanity is not a trivial problem to solve while studying stable relationships in the data. Today, data scientists spend a considerable amount of time in ETL (Extract, Transform and Load) processes to convert the data into usable data formats. They work with the business users across multiple stages to understand the various fields in the available data and their business implications. As a result, they appreciate the real-world limitations that could prevent the generation of clean data. At this stage, data scientists work towards defining the aggregable approximations to maintain the sanity of the data.
The next stage is exploratory data analysis. It requires sharing of the interim information with the business users and formalizing relationships in the data. It leads to a direct discussion between the data scientist, product developer as well as the final business user in some cases. After the exploratory data analysis, the subsequent step is to develop algorithms for predictive modeling.
In the past, predictive data modeling was done before the actual design phase and was called building a prototype with dummy data. In today’s business environment, for predictive data modeling, data scientists review the results using actual data sets from multiple business environments by conducting tests with the help of analytical quality assurance engineers in a test environment. The iterative process also involves analytical software developers, who work with the data scientist to code the actual algorithm in an iterative manner. Whereas, the data scientists review the final results. Thus, data scientists are not only involved in building prototypes but in the entire Solution Development and Quality Assurance cycle with actual data after accounting for multiple business conditions.
The stage after software development is pilot deployment. In this phase, data scientists are responsible for the pilot deployments of the analytical solutions for the first few clients. Earlier, Data Scientists were not directly involved in this phase unless there was a major data condition that was not accounted for in the algorithm. At present, data scientists proactively look at the business results in the pilot deployment phase, working alongside the product development teams for potential enhancements based on actual results. In extreme cases, data scientists also explain to the clients the final results as well as the logic behind the algorithm, which are otherwise difficult to understand.
After the pilot phase, the actual deployment of the analytical solutions begins for the wider client base. In the past, data scientists would move to the next project, once the pilot was over and the actual deployment began. During the deployment stage, there are high possibilities of a new data condition emerging that is not accounted for and needs mitigation. Data scientists need to work with the quality assurance team, which conducts the impact analysis by deploying new algorithm across multiple clients in the test environment, to understand what the results mean and make amends. They require to also, in turn, working along with the product development and technology teams to fine-tune the algorithm before it goes for the final deployment. On an ongoing basis, there could be possibilities of actual algorithm either needing an enhancement or explanations for better understanding. In such cases, data scientist will work with analytical support teams to help the clients use the analytical solutions by building FAQs for responding to common questions by the business users. Data Scientists can play an imperative role in enabling the analytical support teams to understand the solution and develop internal tools and techniques to help the client with its usage. Last but not the least, in some cases, data scientists are involved in the sales cycle as well, supporting in concept selling to the early adopters of the solutions.
Evidently, the role of the data scientist is evolving and will be pivotal in building analytical solutions, right from the sales to ongoing support of these solutions.