Building a machine learning algorithm is not a job which data scientists usually perform. In fact, according to a post by a data science enthusiast, it is a job that needs to be performed by “three genres of engineers” from scratch — data scientists, platform engineers and machine learning engineers. ML engineering fall in the realm of platform engineers who build the required environment and tools specific to the ML task which data scientists use for day-to-day operations.
When it comes to the question of hand-coding or using a library, ML practitioners lean towards libraries which feature higher quality code. ML practitioner use libraries because they know that the competitive advantage doesn’t lie in coding, but how one can add more value in building training environment and datasets. It is best to focus one’s attention on feature selection since with ML there is a lot of GPU coding involved.
In Practice, Here’s Why One Should Go For The Library
Another upside of using an algorithm is that if it is available in an existing library such as scikit-learn or GPy, chances are that the codes within these modules have been rigorously tested for bugs and thus work efficiently. Libraries are also useful for swift prototyping of modules. Scikit-learn has been billed as one of the most popular libraries for machine learning. In fact, almost all advances made in machine learning are on existing algorithms. Also, given the wide availability of open source tools and techniques, hand coding an algorithm is not the primary objective of ML professionals. Given the building block approach in the open source environment, engineers end up stacking on top of existing tools and technologies available.
When One Needs To Reinvent The Wheel
But what happens if you are required to implement an ML algorithm from scratch? Should one go ahead and reinvent the wheel? There are times when business problems do not fit into any template and there is no off-the-shelf ML package that can address the problem. The first step in this situation should be converting a business problem into machine-solvable problem. In fact, most big tech firms, it is the engineers in R&D departments who develop new algorithms. One of the real advantages of hand coding an algorithm is understand the algorithm you want to use better. For example, it is good to implement backprop once before switching over to automatic differentiation. hand coding an algorithm is treated as a learning exercise, but in a fast-paced work environment, data scientists end up manipulating data rather than understanding the underlying algorithm.
People who actually end up coding their own algorithms are hobbyists, learners, research scientists at Google, Microsoft and Kaggle challengers. This approach is highly useful for ML beginners who can sharpen their understanding of implementing machine learning algorithms and shed light on the black box problem. Meanwhile, using inbuilt libraries can speed things up significantly.
How To Sharpen The Learning Phase
Pick an algorithm you intend to work on, and code it yourself. This will help you understand how the algorithm works and how the output was derived. You’ll learn more about its limitations as well. Next stage should be to pick up an appropriate library that implements the algorithm and use it. It will be worth the time exploring the library and modifying it. Also, there are times enterprises decide to implement their own algorithms and companies can come up with more efficient implementations. There are also scenarios when open source implementations can be faulty with bugs and sometimes packages lack functional test coverage. According to a ML researcher, TensorFlow only deals with specific set of problems and focuses primarily on deep learning.
Also, manually implementing a standard ML algorithm helps in understanding the underlying algorithm, so it is no longer a black box. While working on a business problem, the practitioners don’t just implement the algorithm in the standard format but also tweak as per their requirements. This can only happen once you have full control of your code.
If you are keen on understanding the underlying algorithm, it is best to follow the basics of the ML roadmap — catch up on the statistics and maths concepts, so that you can understand the nuances of the algorithm. For mid-level professionals who wish to transition to ML engineering, it is best to fill in the learning gaps by following a basic teaching approach and then move to the more difficult levels. MOOCs, especially by Andrew Ng, cover the key concepts and are also positioned towards real world applications. One can also attempt mini projects to build up skills on the side. This book titled Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems has been voted as one of the best in unravelling key ML concepts.