I always get this question – how do I become a full stack Data Scientist (DS)? Though we may not have as clear of a definition as we do today for a “full stack developer”, we have a sufficiently good knowledge of what goes into building a “Full Stack DS”. A full stack DS in my mind is someone who can influence the product roadmap, move the metrics in the right direction and makes the overall team more data oriented. They should be willing to do what it takes and use any tools at their disposal to unblock the team and themselves. It’s the attitude and bias for action towards the product that defines a full stack DS rather than the knowledge of a few tools and techniques.
I am going to try a different approach here – instead of flooding the space below with a never-ending list of software packages and programing languages, I will talk about the skills I think will help you succeed at different stages of product development (I hope to cover all aspects of a full stack DS, but I must also apologize for any bias I may introduce because my inferences have been drawn mostly from personal experience). In the spirit of bias for action, let’s get started and talk about different life stages of a product
This is the very first stage in the life of the product – an idea! The idea can come from anywhere – an engineer who found something interesting in the code, a researcher who discovered a customer pain point or a DS who found an opportunity while digging through data. Once an idea is there, the product team is expected to move fast and give shape to this idea. A DS can add immediate value by doing exploratory analysis and sizing the opportunity.
Depending on maturity of the data org, a DS may be able to leverage existing tools or should create something entirely new. Based on my experience (sample size n = 1), ~80% of the data in exploratory phase happens to be unstructured, and as such knowledge of Natural Knowledge Processing (NLP) and deep learning methodologies (such as transfer learning) can be key to delivering quick tangible impact. Let me give a brief pause here and mention that all is not lost if you are not a “Machine Learning Scientist” – with a good product sense and an ability to prioritize data-sets (that may be borderline-structured), you may be able to craft an exploratory analysis by matching keywords and identifying patterns.
My favorite tools for conceptual analysis are Python, R, Java, SQL and (wait for it!) Microsoft Excel. Depending on the data maturity of your org, you may also have to use Big data tools such as Hadoop, Hive, MapReduce and Spark to create scalable methods for extracting meaningful data. As for visualization, I prefer to generate my charts in Python (Matplotlib and Plotly are my favorite) and present by final deliverable in Microsoft Powerpoint/Word (that can be read and understood by everyone).
Okay, we have an idea and know there is a market potential – what next! Welcome to the first step in building your Minimum Viable Product (MVP). A DS should come up with key metrics that will be used to measure and finally determine the success or failure of the product. They should also work with engineering and design to lay out a logging framework and tactical roadmap (based on measurement strategy that should be agreed upon in consultation with the product manager).
Finally, they should work with product research (quantitative and qualitative) to gather voice of customer and make sure that we are building the right product and prioritizing the most important features. An example could be working with researcher to craft a survey and gather new data to bridge the gaps in understanding. Most of the tools I use in Formulation phase are same as Conception, however over the years
I have found that my ability to read and understand code has helped lay out a close-to-perfect logging plan (for example, knowing how to structure logger config files are always a huge help).
This is my favorite phase – time to create and ship. A DS should be making sure that tests are set up correctly and measurement data is flowing in. Again, depending on data maturity, you may have to write data pipelines to ingest data and produce a more aggregated/readable output. My favorite tools for this phase are Python, Java, MySQL and Hive. But just knowing the tools won’t get you anywhere unless you know your math.
The success in the execution phase depends largely on the ability of the DS to interpret the test data correctly and make the call for a wider rollout (key aspects include making sure that A/B tests were set correctly, that change in metrics is not due to other factors, that metric changes are statistically significant, that there is no selection bias, and many more).
But the work of a DS doesn’t end with successful rollout – they also need to make sure that data from the new product is being stored efficiently and insights from the data are being displayed for larger consumption. There is always a non-zero probability that this new data will someday help spawn another product idea – so keep the information flowing.
Now comes the million-dollar question – I have done all the points above, am I a Full Stack DS? To be precise, you are 50% there. The other 50% is determined by your ability to build a strong data culture. A strong data culture means that a DS is not in a service mode – they are not simply supporting the product team by pulling data for presentations or reacting passively to asks from other product functions but rather helping build products that is embedded in data-driven decisions. This happens when the DS works proactively in influencing other
- Data Scientists by mentoring and teaching them. If you come to know about a new tool or create a best practice, spread the knowledge. Learn something new every day!
- Product Functions by helping them be more data oriented. Work with your product managers and engineers and help them realize how data can help make decisions faster and better. (No one prefers to write logger code, but remember, a few lines of logger code can help prevent thousands of lines of core rewrite)
To close out my note on you being a full stack DS, here are the 3 essentials
- Have a great product sense
- Know the math
- Engage your team with data
Try deep learning using MATLAB