From structuring Data Science teams for success to building models that can be productionized and discovering the best practices for building data pipelines, this recent Qubole Meetup, organised out of the Bangalore office, went beyond buzzwords to give a clear lowdown on what goes in the buzzing Data Science field.
Of late, the discussion around AI has intensified but delivering the promise of emerging applications of AI in businesses is still a distant reality for business leaders. There are also a slew of other concerns in the data science domain — data science projects that aren’t deployed and models that cannot be productionized are always a key concern for business leaders and managers. Another key concern is rethinking the goals and business metrics and aligning it with the organisational objectives. One aslo has to ma age the steep technical requirements that exist before implementing data science solutions on scale.
This and more such topics were the centre of discussion at a recent meetup organised by Qubole, the cloud data platform leader on March 1. The discussion centred on Applying Data Science In The Industry and saw over 45 attendees, senior managers and practitioners engaging in a meaningful talk with industry heavyweights that included — Swapnasarit Sahu – SVP, Data Science & Analytics at Zeotap, Sathish KS – VP Engineering, Zeotap, Swaminathan Padmanabhan – Director, Data Science at Freshworks, Sharath Babu – Sr Product Analyst at Razorpay, Manish Khandelwal – Engg Manager, Data Science at MIQ Digital & Rajat Gupta – Sr Engg Manager at Qubole. The session was moderated by Rangasayee Chandrasekaran, Senior Product Manager from Qubole
We list down 6 key takeaways of the discussion of the one-hour panel discussion that panned out over beer and snacks
1) Data Science that cannot be productionized is no data science: For example, when we talk about experimentation, to get the model right is one thing. Data Science teams do a lot of experimentation to get the problem, but one should also understand the ramifications of that solution and how it can affect other business functions. One of the core success metrics senior technical leaders go by from an engineering perspective is how many data centric projects one can push to production. However, what’s overlooked during the process is the things that one ends up “breaking/hampering”. In other words, a solution built to solve one problem can have an adverse effect on the other problem and not all data science projects will translate to revenue or operational efficiency.
2) Define the success metric clearly in the data science process: The beauty of Data Science is that one requires to define a clear success metric and this needs to come from senior management who give the team the right authorization to begin with. Metrics can help the team align the project with the business objectives since many a time, data science team members can get carried away with research.
3) Why one should get into production early: At a time when teams are expected to move fast and iterate fast, panelists agreed that one should get to production faster even if it feels compromising on the accuracy of the model. This can help in showing the incremental ROI to business leaders. Citing an example, Padmanabhan shared how at Freshworks, when the data science team builds a prototype solution, and carries out the A/B tests at various levels to ascertain whether the prototype is able to demonstrate value, the engineering team carries out a quick launch, not a full launch. And if the performance is satisfactory, and the objectives of machine learning or data science are achieved, then we go about engineering at scale.
4) Central leadership is crucial to success: According to Sathish KS, VP, Engineering at Zeotap central leadership is very important in the data science field. A leader can balance the requirements of the data science team, put a clear roadmap, define what needs to build, what might be required from other engineering teams and how to navigate through unforeseen problems.
5) Building a stable & scalable data pipeline: How important is the data pipeline an organisation that is building an analytics unit – very important. In India, startups or organisations that are building up a new practice lack a data pipeline. The data is mostly in silos or in dumps, Sahu from Zeotap shared with the attendees. That’s why, having a data pipeline is a must before you start one starts the data practice. “If you are building a new practice it is very important to have a foundation of data pipeline, so that data scientists have the flexibility to pick up the data from various places and for most of the problems, one also requires historical data,” he said.
6) Fundamental shift required in building data science teams: All panelists agreed that a lot of companies are being experimental about structuring their data science teams. Most leading organisations have data scientists embedded within the product team to ensure that data engineers and data scientists work in tandem. For example, at Freshworks, data scientists and data engineers work together as part of one function and also interact with different product teams.