Optimisation is the most important component when we consider machine learning algorithms. It helps in reducing the error and improving the accuracy in the solution of a problem. Gradient Descent is one such algorithm which is used for the purpose of optimisation. Here we take a deeper look at what Gradient Descent is and how it helps in optimisation.
Understanding Gradient Descent
Gradient Descent is the most common optimisation strategy used in ML frameworks. It is basically an iterative algorithm used to minimise a function to its local or global minima. In simple words, Gradient Descent iterates overs a function, adjusting it’s parameters until it finds the minimum. A gradient can be called the partial derivative of a function with respect to its inputs. Basically, it is a measure of the variation in weights with respect to change in error or change in input.
Let us visualise with the simplest example. Consider the following image of a curve:
For better understanding visualise two-dimensional section the curve. We will get something like this:
Now imagine a ball being rolled from the top most end of the curve. The objective is to reach the lowest point. The ball will roll down and then up, repeatedly until it rests at the steepest point. This is how Gradient Descent works. The algorithm repeats and adjusts its parameters or coefficients to find the steepest point.
In the ML context, the Gradient Descent is used to minimise the error by adjusting weights after passing through all the samples in the training set. If the weights are updated after a specified subset of training samples, or after each sample in the training set, then it is called a Stochastic Gradient Descent. The higher the gradient, the steeper the slope and the faster a model can learn. But if the slope is zero, the model stops learning.
With this basic understanding, let us now take a look at how the popular ML packages like TensorFlow and PyTorch solve Gradient Descent.
Gradient Descent With TensorFlow
TensorFlow has a Class called GradientDescentOptimizer to handle Gradient Descent.
Consider the simplest example that illustrates the usage of GradientDescentOptimizer class.
The highlighted part is where the GradientDescentOptimizer is invoked. GradientDescentOptimizer is called with a step of 0.01 which is the standard value.The minimise function minimises the value of the variable error which is defined as the square difference of the actual and predicted set.
The minimise function is a combination of two functions
- Compute_gradients() : This method returns a list of (gradient, variable) pairs where “gradient” is the gradient for “variable”.
- apply_gradients() : This is the second part of minimize(). It returns an Operation that applies gradients.
Gradient Descent with PyTorch
PyTorch uses the Class torch.optim.SGD to Implement stochastic Gradient Descent.
Consider the following illustration.
The lr parameter stands for learning rate or step of the Gradient Descent and model.parameters returns the parameters learned from the data. The gradient buffer is set to zero by the function optimizer.zero_grad() once for every training iteration to reset the gradient computed by the last data batch