Swapnasarit Sahu who is the head of Data Science and Analytics At Zeotap has interesting insights to share on what it takes to be a good data scientist. As an experienced professional with a demonstrated history of working in various domains, he has built many cutting-edge AI products. At Zeotap, he leads a team of data scientists and data engineers to deliver build products from scratch for data monetisation in the telecom industry. He has been instrumental in building user behaviour models, recommendation systems, data quality monitoring and more.
Analytics India Magazine: What are the key skill sets that you look for while hiring for data science roles? What are the languages and technical skills they should know?
Swapnasarit Sahu: Data Science is making data as the centrepiece for decision making and to extract insights, form or learn about patterns in data. In my view, you have to define your overall business goal, derive problems you want to address and solve, and then match the skill sets. Some of the key roles are:
- Kind 1: People skilled with advanced statistics and economic theories. In terms of language to code: Python, R or SAS
- Kind 2: People skilled in machine learning (or deep learning). In terms of language: comfortable with Python, Scala, Java or C. If you are dealing with big data then familiarity big data framework (like Hadoop or Spark or Apache Flink ). For these set of people, algorithmic thinking is also very important. So, it’s important to be really good in data structures and algorithms.
- Kind 3: If you are dealing with supply-chain people familiar with AI planning or queuing theory and optimisation. In terms of coding language: Python or C mostly.
If you are an organisation dealing with a lot of text data then NLP techniques are very much necessary. Python is awesome and easy to manipulate text. You could live entirely in the pythonic world.
For deep learning Tensorflow is also becoming quite popular, but it’s not mandatory though since many things can be achieved using PyTorch and keras.
AIM: What are the non-technical skills and traits that a good data scientist should have? What is the importance of effective communication and business mindset in DS?
SS: Having a curious mind is one the most important aspect of a data scientist. He or she should be aware of what is hypotheses premise (assumptions) and at what level can this hypothesis fail.
Businesses today are mushrooming on data sets, and it important to know a business since all the questions asked about the data is related to business. Once you are aware of the business and the outcome you want to achieve, effective communication also becomes critical.
AIM: Do you believe that a good data scientist should be obsessed with solving problems and not new tools?
SS: Well, like any engineer tools make life easier. Tools help you visualise the data, extract patterns and build better and improved models. In fact, many big data problems you can’t solve without right toolset. For instance, how one can process terabytes of data without big data frameworks like Spark or Flink. So, it’s important to know tools but doesn’t get carried by a particular set of tools.
AIM: Is it educational qualifications or experience that matters more to be a data scientist in companies?
SS: It is often observed that companies hit a roadblock given they were not able to adapt to different business models or change. The fundamental problem was that the company could not predict or understand how the business landscape could change overnight with the advent of new technologies.
If you are a data scientist, you must be someone who would understand beyond monolithic systems. Experience matters way more than educational qualifications. Many good data scientists are self-learned people. During education, we play with a quite ideal world where the standard algorithms work in datasets given. In real the world, it is quite different. It’s really hard to make sense about data given the amount of noise or network effects exists in them. The more problem you solve the more expertise you get how to process the data for a given business case. What things can be achieved, with given data in hand.
AIM: Who would be a preferred candidate for data science role—one with certification in the full-time course or the one with an executive course?
SS: In my view, haven’t seen any good results yet from executive courses offered by institutes in the country. The current executive course helps to develop a mindset, but it’s quite basic in nature. They hardly cover things in depth or talks about the state of the art. Many people during the coursework never worked in a real dataset which we see every day in the business world.
So, I would suggest that self-learning is critical in this field. One need to read research papers keep themselves updated about the state of the art. Also Applying techniques to open datasets available helps a lot.
AIM: What is the best learning curve for a data scientist and the best resources to learn?
SS: Now, there are a lot of wonderful online courses available to learn the basic concepts and codes on GitHub are available to practice. These are the great resources to learn. There also data science competitions like Kaggle which consists of real-world datasets to practice. Candidates can choose any of these options, learns from books or online resources, play enough with data to get the concepts right. It is also important to ask questions such as why certain algorithms behave the way it is behaving.
AIM: What are the subjects a budding data scientist can master during the early days of his/her education and career to be a good data scientist?
SS: Some of the key areas are:
- Linear algebra, probability and statistics for the first kind
- Linear algebra, probability, elementary calculus, machine learning fundamentals. some important CS courses like data structure and algorithms for the second kind
- Linear algebra, probability, machine learning fundamentals and NLP for those who want to choose NLP as career path
- Linear algebra, probability, Al planning like stochastic algorithms (Genetic algorithm, ACO and PSO etc.) and discrete optimisation for supply-chain planning
AIM: What is the importance of industry mentors for a budding data scientist?
SS: Mentor should guide the team to ask the right questions about the data and the business. They should help them set up a realistic goal and expectations about an experiment and translate industry problems to a data science problem whenever needed. Industry mentors have a clear understanding of business problems faced by industry and they can guide with the right direction to tackle new problems which were not there before. Many jobs have become obsolete in this era but there is a need for accountable and transparent AI systems. Guiding data scientists towards these will help their carrier immensely.
AIM: In a nutshell, what are the 3 must skill to be a data scientist?
SS: The 3 must-have skills according to me would be
- Ask the right questions about data
- Learn to measure a failure or success of an experiment with right matrices.
- Be able to implement things fast and learn to fail fast rather than the perfect result