A key balancing act in machine learning is choosing an appropriate level of model complexity: if the model is too complex, it will fit the data used to construct the model very well but generalise poorly to unseen data (overfitting); if the complexity is too low the model won’t capture all the information in the data (underfitting).

In deep learning or machine learning scenarios, model performance depends heavily on the hyperparameter values selected. The goal of hyperparameter exploration is to search across various hyperparameter configurations to find a configuration that results in the best performance. Typically, the hyperparameter exploration process is painstakingly manual, given that the search space is vast and evaluation of each configuration can be expensive.

### The Process Of Learning Parameters

It is a 4 step process:

- Input data is taken
- A generalised function is obtained
- Predicting values
- Parameter learning

The model parameters tell how to transform input data into desired output whereas, the hyperparameters are used to determine the structure of the model in use.

Most common learning algorithms feature a set of hyperparameters that must be determined before training commences.

The hyperparameters change for different training algorithms and few don’t even need one like ordinary least squares.

A hyperparameter can change the outcome of a model for good with regards to the time taken to train it. So, the choice of hyperparameters plays a crucial role. Having hyperparameters is half part of the solution, the second part is knowing what kind of hyperparameters suits the need.

### How Can Hyperparameters Address Issues With Model Design

Hyperparameters help answer questions like:

- The depth of the decision tree
- How many trees are required in random forest
- How many layers should a neural network have
- The learning rate for gradient descent method

There is no one stop answer to finding out the method in which hyperparameters can be tuned to reduce the loss; more or less a trial and error experimentation.

Hyperparameters are adjustable parameters you choose to train a model that governs the training process itself. For example, to train a deep neural network, you decide the number of hidden layers in the network and the number of nodes in each layer prior to training the model. These values usually stay constant during the training process.

To bottle down on the values, there are few methods to skim through the parameter space to figure out the values that align with the objective of the model that is being trained.

While defining the architecture of a machine learning model, it is usually not obvious to come across an optimal one.

To do this, the machine is tasked with selecting the optimal model architecture.

And, the parameters which are used to define this architecture are referred to as hyperparameters and the process of searching the right architecture for the model is known as hyperparameter tuning.

### Setting The Right Tune

Tuning in simple words can be thought of as “searching”. What is being searched are the hyperparameter values in the hyperparameter space.

Source: Sigopt

The above figure illustrates a plot of hyperparameter space in three dimensions. X and Y direction give the hyperparameters, the Z-direction gives the score of the model under consideration.

There are mainly three methods to perform high dimensional non-convex optimisation. They are as follows:

**Grid search** a very common and often advocated approach where you lay down a grid over the space of possible hyperparameters, and evaluate at each point on the grid; the hyperparameters from the grid which had the best objective value is then used in production.

Using Python and Azure to perform tuning:

from azureml.train.hyperdrive import GridParameterSampling

param_sampling = GridParameterSampling(

{“num_hidden_layers”: choice(1, 2, 3),

“batch_size”: choice(16, 32)

})

**Random search** is performed by evaluating n uniformly random points in the hyperparameter space and select the one producing the best performance. But this method has its own disadvantages like high variance. So, a better, more intelligent alternative would be Bayesian optimisation.

from azureml.train.hyperdrive import RandomParameterSampling

param_sampling = RandomParameterSampling( {

“learning_rate”: normal(10, 3),

“keep_probability”: uniform(0.05, 0.1),

“batch_size”: choice(16, 32, 64, 128)

})

**Bayesian optimisation** builds a surrogate for the objective and quantifies the uncertainty in that surrogate using a Bayesian machine learning technique, Gaussian process regression, and then uses an acquisition function defined from this surrogate to decide where to sample.

from azureml.train.hyperdrive import BayesianParameterSampling

param_sampling = BayesianParameterSampling( {

“learning_rate”: uniform(0.05, 0.1),

“batch_size”: choice(16, 32, 64, 128)

})

Azure Machine Learning service supports random sampling, grid sampling, and Bayesian sampling. Know more about it here

For architectures like LSTM, the learning rate and the size of the network are its prime hyperparameters.

In reinforcement learning algorithms, to measure the sensitivity of choice of hyperparameters, a larger number of data points because the performance is adequately captured with a lesser number of points due to high variance.

The efficiency of a machine learning model pivots on two factors: Speed and scores. Few applications require real-time outputs where speed is required and others require accuracy. There is always a tradeoff between these two and to get insights on how to proceed with a specific problem, hyperparameters offer decent assistance.

Read more about Grid and Random search here

Read more about Bayesian optimisation here