Despite the growing need for roles relating to the Analytics field across the globe, entrepreneurs as well as big organisations are always searching for ways to improve the data science workflow. One of the key reasons for the constant search of upgradation is the repetitive nature of work put on the plate of data scientists and their peers.
This is where Automated Machine Learning or automated machine learning comes in the picture.
Repetitive tasks can try any human’s patience and when a large quantity of data is to be poured over. If certain task that which the data scientists come across are automated, they can save a lot of time and effort.
Here are a few examples:
– Exploratory Data Analysis: Visualising data before designing model
– Feature Transformations: Encoding categorical variables, imputing missing values, and encoding sequences and text, among others
– Algorithm Selection and Hyper-Parameter Tuning: Choosing apt algorithms from a plethora of selection
– Model Diagnostics: Making ROCs, partial dependence plots, learning curves, among others
So What Exactly Is Automated Machine Learning?
We came across one of the most convoluted, yet mind-blowing definition of Automated Machine Learning on an blog. The writer says:
“…If computer programming is about automation, and machine learning is all about automating automation, then Automated Machine Learning can be defined as, ‘The automation of automating automation’.”
While many may find this definition confusing, one can draw a parallel to the same from human evolution.
The writer then dissects the definition:
– Programming relieves people by managing routine tasks
– Machine Learning allows computers to learn how to best perform such tasks
– Automated Machine Learning helps computers to learn how to optimise the outcome of learning how to perform the routine actions.
Popular Automated Machine Learning Tools:
How Can Automated Machine Learning Be Used:
Machine Learning is used in almost all analytics-based organisations. But many complex tasks such as pre-processing data, selecting appropriate features and model family, optimising model hyperparameters, post-processing machine learning models and critically analysing the results, among others, often fall beyond the purview of analysts who don’t specialise in ML.
For example, Google has previously successfully applied deep learning models to many applications — from image recognition to speech recognition to machine translation. Typically, their machine learning models are designed by a team of engineers and scientists. This process of manually designing the machine learning models is difficult because the search space of all possible models can be combinatorially large.
But last month, Google’s AutoML spawned a “child” using its reinforcement learning technique. Named NASNet, (the “parent”) acts as the neural network for its task-driven AI child.
“NASNet may be resized to produce a family of models that achieve good accuracies while having very low computational costs. For example, a small version of NASNet achieves 74 percent accuracy, which is 3.1 percent better than equivalently-sized, state-of-the-art models for mobile platforms. The large NASNet achieves state-of-the-art accuracy while halving the computational cost,” said an official Google statement.
But Google CEO Sundar Pichai has big hopes from AutoML. He was reported saying, “Today these are handcrafted by machine learning scientists and literally only a few thousands of scientists around the world can do this… We want to enable hundreds of thousands of developers to be able to do it.”
Automated Machine Learning will lessen the scientists’ and programmers’ dependency on intuition by trying out an algorithm, scoring, testing and refining other models. In fact, it will automate the machine learning process of the data science workflow in the organisation.
Randy Olson, Senior Data Scientist at University of Pennsylvania Institute for Biomedical Informatics, and lead developer of TPOT has gone on record to say:
“In the near future, I see automated machine learning (AutoML) taking over the machine learning model-building process: once a data set is in a (relatively) clean format, the AutoML system will be able to design and optimise a machine learning pipeline faster than 99% of the humans out there. Perhaps AutoML systems will be able to expand out to cover a larger portion of the data cleaning process, but many tasks — such as being able to pose a problem as a machine learning problem in the first place — will remain solely a human endeavour in the near future.”