MITB Banner

You Can Help Your Neural Network Learn Better By Giving It A Good Start

Share

Neural networks have been used by data scientists in almost all the fields in the current scenario. In this article, we will help you understand what are initialisation methods as well as why they are crucial while designing a neural network model.

While designing a neural network, it is crucial to train the neural network by initialising the weights and biases. You have to initialise the parameters in your network in order to gain optimisation. It influences the speed of convergence, generalisation as well as the probability of convergence. The most usual way one can initialise the weights in a model is to set at random.   

Types Of Methods

Some of the most used initialisation methods are zero initialisation, random initialisation and Xavier initialisation methods. Besides these methods, there are other several kinds of initialisation methods which are used currently and are available in deep learning library, Keras and they are

  • Zeros (Initialiser that generates tensors initialised to 0)
  • Ones (Initialiser that generates tensors initialised to 1)
  • Constant (Initialiser that generates tensors initialised to a constant value)
  • RandomNormal (Initialiser that generates tensors with a normal distribution)
  • RandomUniform (Initialiser that generates tensors with a uniform distribution)
  • TruncatedNormal (Initialiser that generates a truncated normal distribution)
  • VarianceScaling (Initialiser capable of adapting its scale to the shape of weights)
  • Orthogonal (Initialiser that generates a random orthogonal matrix)
  • Identity (Initialiser that generates the identity matrix)
  • lecun-uniform (LeCun uniform initialiser)
  • glorot_normal (Glorot normal initialiser, also called Xavier normal initialiser)
  • glorot_uniform (Glorot uniform initialiser,also called Xavier uniform initialiser)
  • he_normal (He normal initialiser), lecun_normal (LeCun normal initialiser)
  • he_uniform (He uniform variance scaling initialiser).

Zero Initialisation

If we suppose that in a neural network, all the neurons are to be zero, then the neurons will have the same weights which will follow to the same gradient and thus end up to the same as another and thus it will fail to work. However, the zero initialisation is considered as the bad approach for it is not so meaningful one as we have to initialise it will all zeros while learning the neural network. One can also say that setting weights to zero in a network will create a model which is similar to a linear model.

Random Initialisation

Unlike zero initialisation, random initialisation produces better accuracy, the weights in the networks are initialised randomly in closed numbers to zero and less than one but not equal to zero which finally results in different computations. However, there is also an issue that this initialisation is prone to exploding gradient problems as well as vanishing gradient problems which leads to slower convergence or overflow, incorrect computations, etc.

Xavier Initialisation

This initialisation method is quite similar to random initialisation. In this article, the author mentioned two important cases which can be avoided by using the Xavier initialisation method. The cases are as mentioned below:

  • If the weights in a network start too small, then the signal shrinks as it passes through each layer until it’s too tiny to be useful.
  • If the weights in a network start too large, then the signal grows as it passes through each layer until it’s too massive to be useful.

To avoid such cases, Xavier initialisation makes sure that the weights are in the correct proportion which keeps the signal in a reasonable range of values through many layers.

Why Is It Important

In an artificial neural network, weights are those which connects the nodes between layers and initialising the weights in a neural network speed up the learning process of the algorithm. It is crucial to assess the ability of a neural network by repeating the searching process for a number of times and take in count the average performance of the neural network model. Also, on the last note, if the initialisation methods fail, you can easily use some mitigation techniques like RELU as activation function, gradient clipping, etc.

Share
Picture of Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.