MITB Banner

A Lowdown On Alternatives To Gradient  Descent Optimization Algorithms 

Share

Gradient Descent is the most common optimisation strategy used in machine learning frameworks. It is an iterative algorithm used to minimise a function to its local or global minima. In simple words, Gradient Descent iterates overs a function, adjusting it’s parameters until it finds the minimum. 

Gradient Descent is used to minimise the error by adjusting weights after passing through all the samples in the training set. If the weights are updated after a specified subset of training samples, or after each sample in the training set, then it is called a Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) and many of its variants are popular state-of-the-art methods for training deep learning models due to their efficiency. However, SGD suffers from many limitations that prevent its more widespread use: for example, the error signal diminishes as the gradient is back-propagated (i.e. the gradient vanishes); and SGD is sensitive to poor conditioning, which means a small input can change the gradient dramatically. 

Learning rate is a parameter, denoted by α(alpha), is used to tune how accurately a model converges on a result (classification/prediction, etc.). This can be thought of as a ball thrown down a staircase. A higher learning rate value is equivalent to the higher speed of the descending ball. This ball will leap skipping adjacent steps and reaching the bottom quickly but not settling immediately because of the momentum it carries.

Learning rate is  scalar – a value which tells the machine how fast or how slow it arrives at some conclusion. The speed at which a model learns is important and it varies with different applications. A super fast learning algorithm can miss a few data points or correlations which can give better insights on the data. Missing this will eventually lead to wrong classifications.

This momentum can be controlled with three common types of implementing the learning rate decay:

  1. Step decay: Reduce the learning rate by some factor every few epochs. Typical values might be reducing the learning rate by a half every 5 epochs, or by 0.1 every 20 epochs
  2. Exponential decay has the mathematical form α=α_0e^(−kt), where α_0,k are hyperparameters and t is the iteration number
  3. 1/t decay has the mathematical form α=α0/(1+kt) where a0,k are hyperparameters and t is the iteration number

There is no one stop answer to finding out the method in which hyperparameters can be tuned to reduce the loss; more or less a trial and error experimentation.

To bottle down on the values, there are few methods to skim through the parameter space to figure out the values that align with the objective of the model that is being trained:

  • Adagrad is an adaptive learning rate method. Weights with a high gradient will have low learning rate and vice versa
  • RMSprop adjusts the Adagrad method in a very simple way to reduce its aggressive, monotonically decreasing learning rate. This approach makes use of a moving average of squared gradients
  • Adam is almost similar to RMSProp but with momentum

Whereas, Alternating Direction Method of Multipliers (ADMM) has been used successfully in many conventional machine learning applications and is considered to be a useful alternative to Stochastic Gradient Descent (SGD) as a deep learning optimizer. 

ADMM And Alternatives

Adam is the most popular method because it is computationally efficient and requires little tuning. Other well-known methods that incorporate adaptive learning rates include AdaGrad, RMSProp and AMSGrad.

The use of the Alternating Direction Method of Multipliers (ADMM) has been proposed as an alternative to SGD.

Recently, the Alternating Direction Method of Multipliers (ADMM) has become popular with researchers due to its excellent scalability. 

However, as an emerging domain, several challenges remain, including:

  1. The lack of global convergence guarantees, 
  2. Slow convergence towards solutions, and 
  3. Cubic time complexity with regard to feature dimensions. 

To address these problems, the researchers at the George Mason University propose a novel optimization framework for deep learning via ADMM (dlADMM) to address these challenges simultaneously.

Here’s how dlADMM tries to solve few challenges:

  • The parameters in each layer are updated backward and then forward so that the parameter information in each layer is exchanged efficiently. 
  • The time complexity is reduced from cubic to quadratic in (latent) feature dimensions via a dedicated algorithm design for subproblems that enhances them utilizing iterative quadratic approximations and backtracking.
  • Experiments on benchmark datasets demonstrated that proposed dlADMM algorithm outperforms most of the comparison methods.

Know more about dlADMM here.

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.