A machine learning model can have many dependencies and to store all the components to make sure all features available both offline and online for deployment, all the information is stored in a central repository.
The main objective of having a proper pipeline for any ML model is to exercise control over it. A well-organised pipeline makes the implementation more flexible. It is like having an exploded view of a car engine where you can pick the faulty pieces and replace it- in our case, replacing a chunk of code.
A pipeline consists of a sequence of components; components which are a compilation of computations. Data is sent through these components and is manipulated with the help of computation.
Pipelines, unlike the name would suggest are not one-way flows. They are cyclic in nature and enable iteration to improve the scores of the machine learning algorithms. And, make the model scalable.
A typical machine learning pipeline would consist of the following processes:
- Data collection
- Data cleaning
- Feature extraction (labelling and dimensionality reduction)
- Model validation
Data collection and cleaning are the primary tasks of any machine learning engineer who wants to make meaning out of data. But getting data and especially getting the right data is an uphill task in itself.
Data quality and its accessibility are two main challenges one will come across in the initial stages of building a pipeline.
The captured data should be pulled and put together and the benefits of collection should outweigh the costs of collection and analysis.
But there can be problems associated with the information that is deployed into the model such as:
- an incorrect model gets pushed
- incoming data is corrupted
- incoming data changes and no longer resembles datasets used during training
Researchers from MIT and elsewhere have developed an interactive tool that, for the first time, lets users see and control how increasingly popular automated machine-learning (AutoML) systems work.
The tool, ATMSeer, generates a user-friendly interface that shows in-depth information about a chosen models’ performance, as well as the selection of algorithms and parameters that can all be adjusted.
At The Heart Of ATMSeer
This new tool ATMSeer is built around ‘Auto-Tuned Models (ATM).’ What this model does differently than other automated machine learning models is that it catalogues all the results as it tries to fit the models to data.
ATM randomly selects an algorithmic approach, be it neural networks or decision trees and also model’s hyperparameters like the size of the tree or number of layers in a network.
The model does this act of choosing and tuning hyperparameters repeatedly while assessing the performance. The results of this performance become a determining factor in choosing the next model; a better one. Finally, it displays all the results with models that suit a particular task.
ATMSeer interface consists of a control panel that allows users to upload datasets and an AutoML system, and start or pause the search process. There is also a “leaderboard” of top-performing models in descending order. A non-expert can decipher the performance of various models with these intuitive visualisations.
ATMSeer includes an “AutoML Profiler,” with panels containing in-depth information about the algorithms and hyperparameters, which can all be adjusted. One panel represents all algorithm classes as histograms — a bar chart that shows the distribution of the algorithm’s performance scores, on a scale of 0 to 10, depending on their hyperparameters.
“We let users pick and see how the AutoML systems works,” says Kalyan Veeramachaneni, a principal research scientist in the MIT Laboratory for Information and Decision Systems (LIDS), who leads the Data to AI group.
Whether it is the market crash or a wrong diagnosis, the after-effects will be certainly irreversible. Tracking the development of machine learning algorithm throughout its life cycle, therefore, becomes crucial.
Know more about Auto-Tuned Models here.