How does one evaluate experienced machine learning engineers as opposed to beginners? Overall, the role of machine learning engineers is working primarily on building data pipelines, A/B test and benchmarking infrastructure. The job of ML engineer is to get the inputs one requires to feed the models. Only when this process is done, that the data scientists focus on building the actual algorithms or models. Here, they use industry standard tools such as logistic regression, random forest, sometimes other linear models. According to one Redditor, the data pipeline is similar to a logistic model and runs in batches once a week.
ML engineers should be comfortable doing design and code reviews and dealing with large structured and unstructured data. Besides a core knowledge of CS concepts such as data structures and algorithms, professional ML engineers are expected to skilled in various architectures and have incredible programming chops (for example, Java, Scala, C etc)
Experienced ML engineers have core programming skills, can write code and have a good grasp on the development ecosystem as well. Some of the core traits are to be able to implement their work in a production-level language rather than just in MATLAB and maintaining code as part of the team. Machine learning engineers come from varied background but they need to have proficiency in writing modular, reusable code. This is where experienced ML engineers score over beginners.
Scale of data: Another key area is the scale of data ML engineers deal with. Most of the times, beginners work on small datasets, that range from 100k – 10m rows to have high accuracy on their test dataset. However, in reality, machine learning engineers deal with terabytes of data and also have multi-day training times, across many different machines. Experienced engineers also deal with scaling, fault-tolerance, possibly only training on a subset of data, and figuring out how to pick that subset.
Complex Features: ML engineers are required to build recommendation engines on vast client’s datasets. This also means dealing with varied, complex data which means text, images and other unstructured data. This is an area where beginners will have to learn to cope with unstructured data and build different types of algorithms. This is also known as the art of dealing with imperfect data. At the beginner level, one practices on clean datasets which can be easy to train and build. But in practice, data wrangling has been cited as one of the key tasks of data scientists and ML engineers who spend 80 percent of their time cleaning up datasets. For example, sometimes the data sets have a small number of one type of category, and one has to make sure that the algorithm doesn’t get too biased by this. I
Debugging: A post on ML debugging indicates how during the process of building machine learning models, one may run into a situation where a model is not working as well as you would like. Sometimes, the error rate is too high or the model works fine on the training data, but fails when applied on real-world data. By now, everybody knows that machine learning is known for its black box technique. In other words, machine learning algorithms suffer from high variance or high bias. This means the models are not good at generalising and when applied to other data or unseen data, they mail fail. So, the model may work well on test data but not in practice. This is another stumbling block for beginners who are yet to gain wide experience in building real-world applications. Most beginners will not even know if their model is suffering from high variance and this situation requires a lot of training examples to find a good, general solution.
For example, an image classifier is created to find whether there is a tank in a picture or not. It works with great accuracy on the test data, but doesn’t do well in practice.
In a post, Muktabh Mayank, a data science practitioner, explained that how most beginners may land an ML job at an enterprise or startup but they may end up overestimating their CVs which is primarily about data science projects. He explained how beginners as compared to professional ML engineers do not know about how real world works, they have very high expectations due to that and the real world is unable to meet that. But this doesn’t mean it is impossible for a fresher to take up an ML job. Instead, they should be more geared towards real-life projects and solving real business cases.