Neural networks have been used by data scientists in almost all the fields in the current scenario. In this article, we will help you understand what are initialisation methods as well as why they are crucial while designing a neural network model.
While designing a neural network, it is crucial to train the neural network by initialising the weights and biases. You have to initialise the parameters in your network in order to gain optimisation. It influences the speed of convergence, generalisation as well as the probability of convergence. The most usual way one can initialise the weights in a model is to set at random.
Types Of Methods
Some of the most used initialisation methods are zero initialisation, random initialisation and Xavier initialisation methods. Besides these methods, there are other several kinds of initialisation methods which are used currently and are available in deep learning library, Keras and they are
- Zeros (Initialiser that generates tensors initialised to 0)
- Ones (Initialiser that generates tensors initialised to 1)
- Constant (Initialiser that generates tensors initialised to a constant value)
- RandomNormal (Initialiser that generates tensors with a normal distribution)
- RandomUniform (Initialiser that generates tensors with a uniform distribution)
- TruncatedNormal (Initialiser that generates a truncated normal distribution)
- VarianceScaling (Initialiser capable of adapting its scale to the shape of weights)
- Orthogonal (Initialiser that generates a random orthogonal matrix)
- Identity (Initialiser that generates the identity matrix)
- lecun-uniform (LeCun uniform initialiser)
- glorot_normal (Glorot normal initialiser, also called Xavier normal initialiser)
- glorot_uniform (Glorot uniform initialiser,also called Xavier uniform initialiser)
- he_normal (He normal initialiser), lecun_normal (LeCun normal initialiser)
- he_uniform (He uniform variance scaling initialiser).
If we suppose that in a neural network, all the neurons are to be zero, then the neurons will have the same weights which will follow to the same gradient and thus end up to the same as another and thus it will fail to work. However, the zero initialisation is considered as the bad approach for it is not so meaningful one as we have to initialise it will all zeros while learning the neural network. One can also say that setting weights to zero in a network will create a model which is similar to a linear model.
Unlike zero initialisation, random initialisation produces better accuracy, the weights in the networks are initialised randomly in closed numbers to zero and less than one but not equal to zero which finally results in different computations. However, there is also an issue that this initialisation is prone to exploding gradient problems as well as vanishing gradient problems which leads to slower convergence or overflow, incorrect computations, etc.
This initialisation method is quite similar to random initialisation. In this article, the author mentioned two important cases which can be avoided by using the Xavier initialisation method. The cases are as mentioned below:
- If the weights in a network start too small, then the signal shrinks as it passes through each layer until it’s too tiny to be useful.
- If the weights in a network start too large, then the signal grows as it passes through each layer until it’s too massive to be useful.
To avoid such cases, Xavier initialisation makes sure that the weights are in the correct proportion which keeps the signal in a reasonable range of values through many layers.
Why Is It Important
In an artificial neural network, weights are those which connects the nodes between layers and initialising the weights in a neural network speed up the learning process of the algorithm. It is crucial to assess the ability of a neural network by repeating the searching process for a number of times and take in count the average performance of the neural network model. Also, on the last note, if the initialisation methods fail, you can easily use some mitigation techniques like RELU as activation function, gradient clipping, etc.