The goal of hyperparameter exploration is to search across various hyperparameter configurations and find a configuration that results in the best performance. Typically, the hyperparameter exploration process is painstakingly manual, given that the search space is vast and evaluation of each configuration can be expensive.
Hyperparameters help answer questions like:
- The depth of the decision tree
- How many trees are required in random forest
- How many layers should a neural network have
- The learning rate for the Gradient Descent method.
Hyperparameters are adjustable parameters one chooses to train a model that governs the training process itself. For example, to train a deep neural network, you decide the number of hidden layers in the network and the number of nodes in each layer prior to training the model. These values usually stay constant during the training process.
To bottle down on the values, there are few methods to skim through the parameter space to figure out the values that align with the objective of the model that is being trained.
While defining the architecture of a machine learning model, it is usually not obvious to come across an optimal one because there is no one-stop answer to finding out the method in which hyperparameters can be tuned to reduce the loss; more or less a trial and error experimentation.
Techniques At Disposal
For architectures in particular like Long Short Term Memory(LSTM) networks, the learning rate and the size of the network are its prime hyperparameters.
In reinforcement learning algorithms, to measure the sensitivity of choice of hyperparameters, a larger number of data points because the performance is adequately captured with a lesser number of points due to high variance.
There are mainly three methods to perform high dimensional non-convex optimisation. They are as follows:
- Grid search a very common and often advocated approach where you lay down a grid over the space of possible hyperparameters, and evaluate at each point on the grid; the hyperparameters from the grid which had the best objective value is then used in production.
- Random search is performed by evaluating n uniformly random points in the hyperparameter space and select the one producing the best performance. But this method has its own disadvantages like high variance. So, a better, more intelligent alternative would be Bayesian optimisation.
- Bayesian optimisation builds a surrogate for the objective and quantifies the uncertainty in that surrogate using a Bayesian machine learning technique, Gaussian process regression, and then uses an acquisition function defined from this surrogate to decide where to sample.
Apart from the above conventional methods, one can also make use of the graph-based systems for hyperparameter tuning.
To optimise and automate the hyperparameters, Google introduced Watch Your Step, an approach that formulates a model for the performance of embedding methods. In short, making the graph to concentrate on direct significant neighbours. Here the “Auto” portion corresponds to learning the graph hyperparameters by backpropagation.
Tools At Disposal
In this age of information abundance, especially in the world of AI where a new tool gets added and a new paper get published every other day, it becomes highly impractical for a practising machine learning engineer to keep track of which libraries work, which hyperparameters are best.
It is always great to have a toolbox that can automatically save and learn from experiment results, leading to long-term, persistent optimization that remembers all tests. A toolbox by the name Hyperparameter Hunter was released recently, which does exactly the same. The creators call this tool as a personal machine learning toolbox/assistant.
Hyperparameter hunter allows the users to run all of the benchmark/one-off experiments through it and it doesn’t start optimization from scratch like other libraries. It considers all the previously run experiments and previous optimization rounds that have been already run through it. The creators insist that Hyperparameter Hunter gives better results with increased usage.
Key Features Include
- Stop worrying about keeping track of hyperparameters, scores, or re-running the same Experiments
- Automatically reads the Experiment files to find the ones that fit, and it learns from them
- Eliminates boilerplate code for cross-validation loops, predicting, and scoring
- Have predictions ready to go when it’s time for ensembling, meta-learning, and finalizing the models.
Dependencies: Dill, NumPy, Pandas, SciPy, Scikit-Learn, Scikit-Optimize, SimpleJSON
Here’s a quick guide to get started with hyperparameter_hunter:
pip install hyperparameter_hunter
Setting Up Environment
from hyperparameter_hunter import Environment, CVExperiment
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import StratifiedKFold
from xgboost import XGBClassifier
from hyperparameter_hunter import BayesianOptPro, Real, Integer, Categorical
Sample Code Using hyperparameter_hunter defining the
OptPro (Optimization Protocol)
optimizer = BayesianOptPro(verbose=1)
Choosing which hyperparameters we want to optimize.
objective='reg:linear', # setting this as a constant guideline – Not one to optimize