Undoubtedly, transitioning from engineering to data science is one of the trickiest transitions in the most sought after field. Taking a plunge from software engineering role to data scientist/analyst is fraught with challenges, that too after having spent a decade in the industry. Where the two roles differ are that data analysis requires a statistical bent of mind and reasoning.
AIM caught up with Harish Subramanian, Program Director, PGP- Big Data Analytics, Great Lakes Institute of Management on the roles a software engineer can explore in data science. “Data scientists (in the pure sense of the word) are usually only a small part of data science and analytics teams. Correspondingly, building new models is a relatively small portion of the overall analytics function. Everything from identifying, sourcing and managing data, to working the technology stack to make the intricate models work effectively – these are all roles that software engineers with an adaptable and curious mind are perfectly placed to succeed in,” he shares.
To make a successful switch, one has to have a basic grounding in statistics such as Median Standard, Correlation and Linear Regression & Deviation before deep diving into modelling. Often, statisticians and mathematicians with training data analysis techniques such as graphing, plotting, analysis and hypothesis find it easy to transition to data scientist role. William Chen, data scientist at Quora and an avid writer about data science has emphasized that “data science is purely about statistics. Statistics will guide you to be able to understand uncertainty in data and pull valid insights from data.”
One of the most common career paths for software engineers or programmers who are known for working on software architectures is usually product management or system engineering.
Here’s how you can beef up on basic data analyst skills:
- Learn Hypothesis generation and analysis through plots, graphs, and reasoning
- Get up to speed with statistical reasoning concepts such as causality and probability theory. To get started, check out this link
- Data science is also about products, learn how to leverage data to figure out product features, enhancements
- Data munging is the art of cleaning data. It is a time consuming job and entails dealing with missing data and changing schema. Before diving into large datasets, explore cleaning data
- While Kaggle is hailed as a stepping stone for honing machine learning and data analysis skills, the competition features curated datasets that are anonymized and cleaned.
- Understanding the business domain one wishes to work in. The data analysis problems are solved according to the business needs, hence domain understanding is a must
- Data science is a long-learning process. Switching to data engineering and learning statistics on your own can be one learning path towards a deeper learning experience
Analytics India Magazine gets in industry experts to weigh-in on the raging topic and lay down steps to effectively transition from software engineering to data science:
A switch from one industry to another is always a challenging move, especially if the years of experience is high, such as 10 years. The aspiring data analytics/scientist should focus on these aspects:
Sharpening business domain understanding
According to Rohit Sharma, UpGrad’s Program Director, data professionals are ultimately business problem solvers. They need to understand what problem they are trying to solve and what business levers they can move to make sure that they achieve the expected output. For e.g. let us say an analyst is asked to reduce the churn of customers from a retail bank. In that case, the analyst would need to know the types of products that exist within a bank and how customers engage with those products.
Harish Subramanian, Program Director, PGP- Big Data Analytics, Great Lakes Institute of Management emphasizes it’s always important to understand what problem you’re trying to solve. Analytics techniques are a means to an end. And understanding the most important challenges for a business is vital to selecting appropriate techniques. For example, a missed prediction of fraud is significantly more expensive to the financial firms than a corresponding wrong prediction of email open rates is to a marketing department. Understanding this will help you make suitable trade-offs between complexity, effectiveness and cost of implementing various analytical techniques. It’s not always a question of picking the most cutting edge solution. Good decision making calls for picking the right tool/technique for the job.
Shoring up Mathematical/Statistical knowledge
When it comes to statistical knowledge, Subramanian believes statistics, probability and mathematics in general is often the most daunting prospect for people entering data science and analytics as a field. The amount of statistical knowledge needed to do effective analysis doesn’t take an advanced degree to master. Linear algebra, metrices, statistical tests, distributions, likelihood estimators, regression, the Bayes theorem and conditional probability are all you need to get started. There isn’t a whole lot of merit in learning very advanced statistics until you start working with lots of data, and hit a ceiling in terms of efficacy of your models.
Sharma from UpGrad elaborates how once the problem has been identified, statistics becomes important to make sure that the analysis and the decision making is objective. Mathematics/Statistics power all the algorithms which are used to quantify the impact of variables under analysis; they identify hidden patterns and also make predictions or recommendations. In the customer churn case, the analyst may use logistic regression to predict which customers are likely to churn by statistically quantifying the impact of various factors (such as account balance, number of credit cards etc.) that have led to a churn of customers in the past.
Building up technical know-how
Since organizations are dealing with millions of data points, they need technological solutions which can help them apply the algorithms at scale. Here, tools like R, Python etc., become extremely important. The analyst in this case would use tools like R to apply logistic regression to all the customers in the bank’s database so as to identify potential churners, Sharma from UpGrad shared.
Subramanian believes though there are hundreds of tools and packages that help you master various facets of analytics, you can get started with relatively few tools. A statistical programming language like R or Python and a database querying language like SQL are good enough to start with. One would be surprised at how much analytics you can get done with Excel alone, but the limitation is usually in the formats and characteristics of data you can work with in Excel.
Gaining expertise in analytics
According to Kamal Das, VP, Program Management, Jigsaw Academy, a switch from one industry to another is a challenging move, one that requires patience and building up technical expertise. He shared his views with AIM on how to make a switch up the ladder with these steps.
- Train from the leaders in analytics and good brand names with quality training
- Like other areas, you will have cheaper clones that offer cheaper products but at lower quality- don’t save pennies (on course) to lose dollars (on career)
- Check to see that the peer group is similar to yours (the tone of teaching for a class with average experience of 3 years is different from one with average experience of 10 years)
Showcasing your newfound competencies in analytics
Das believes competitions like Kaggle offer a perfect platform for getting started with working on huge datasets. Besides, Capstone case studies help in tackling real world problems.
- Focus on how to make the pivot to analytics without losing on the wealth of experience you have
- Show your depth by writing blogs/articles on your own site, as well as on forums like LinkedIn etc
- Work on projects – The course should have a capstone project with industry. You may also reference sites like Kaggle
According to Subramanian, one must show they are competent in analytics, not just tell. “When competing with thousands of people who all claim to know the same things you do, it is hard to distinguish yourself if all you have to say is “I know x, y and z”. You need to be able to show that you have learned to work with data, written code, cleaned and processed datasets and tuned models to improve their effectiveness,” he shares.
He adds a slew of data science competitions that help in demonstrating analytical chops. Kaggle challenges are a great place to build this portfolio of work, as are HackerEarth, Analytics Vidhya and DataCamp challenges. Even if you’re not working on these formal challenges, you’d be well served to upload your dataset, code snippets and model outputs (as well as brief description of what you did and why) on Github. More often than not, smart data science teams ask for your Github account to evaluate your proficiency.
Using your network to find a career in analytics
Finally, it may require patience, networking and showcasing cases/data sets that one has worked on to help convince people of your depth and strength in analytics, Das advises on how to maximize the networking opportunities.
- The course should have industry experts and networking opportunities
- Change your CV and cover letter to show your analytics profile
- Leverage and grow network in analytics by showcasing your knowledge
Career transitions aren’t easy. Remember, you don’t have to unlearn everything. Subramanian advises, “To make any successful transition, you’re likely to succeed if you build on your existing knowledge. So, if you’re proficient as a programmer, then transitioning into data engineering roles allows you to use your proficiency rather than start from scratch. Similarly, if you’re proficient in databases and data warehouses, then you’re well positioned to move into data architecture roles”.
We give you a recap of technical and non-technical skills to beef up on to make a head-start in the data-intensive field. Dealing with unstructured data, familiarity with Hadoop platform, acing modelling language such as R, Python and querying languages such as Pig, Hive and SQL and lastly statistics. Communication skills and an innate curiosity will go a long way in optimizing products and services.