Industry-leading neural networks have almost unlimited compute and space at their disposal, as they are configured to run in the most powerful manner. However, developers that are creating deep learning applications for use on mobile devices don’t have this luxury.
This has led to a rise in the requirements for smaller neural networks for on-the-go applications. With the rising use of AR, facial recognition and voice assistants, it has become more accessible to use ML features on mobile devices.
These developers are looking for newer and more effective ways of reducing the size and amount of compute required for the application of neural networks. One of the most popular methods is called pruning.
What is Neural Network Pruning
Simply put, pruning is a way to reduce the size of the neural network through compression. After the network is pre-trained, it is then fine-tuned to determine the importance of connections. This is done through the ranking of the neurons from the network, with the first example being described in Yann Lecun 1990 paper ‘Optimal Brain Damage‘.
The basic principles of pruning include removing unimportant weighted information using second derivative data. This results in better generalisation results, improved speed of processing the results and a reduced size as well.
Pruning is usually done in an iterative fashion, to avoid the pruning of necessary neurons. This also ensures that an important part of the network is not lost, as neural networks are a black box. The first step is to determine which neurons are important and which aren’t.
After this, the last important neuron is removed, followed by the fine-tuning of the algorithm. At this point, a decision can be made to continue the pruning process or to stop pruning,
While it has not been widely publicised method of reducing the size, this is due to the previous infectivity of ranking algorithms. It is also a better approach to start with a larger network and prune it after training rather than training a smaller network from the get-go.
Types Of Pruning And Their Effectiveness
Pruning can come in various forms, with the application of the method depending on the kind of output that is required from the developer. IN some cases, speed is more preferred, and in some cases, the storage is required to be reduced.
One of the first methods of pruning is pruning entire convolutional filters. Using an L1 norm of the weight of all the filters in the network, they rank them. This is then followed by pruning the ‘n’ lowest ranking filters globally. The model is then retrained and this process is repeated.
There also exist methods for implementing structured pruning for a more light-touch approach of regulating the output of the method. This method utilises a set of particle filters which are the same in number as the number of convolutional filters in the network.
Then, the network’s accuracy is determined by a test on a validation set. Based on this, the pruning occurs. However, this process is heavy on compute and should only be used for smaller data sets.
Nvidia has also released a method for pruning CNNs as a part of their research into neural networks. A more detailed explanation of this can be found here.
At its basic level, this method involves pruning each filter and observing how the cost function changes when the layers are changed. This is a brute force method and is not very effective without a large amount of compute. However, the ranking method is highly effective and intuitive. It utilises both the activation and gradient variables as ranking methods, providing a clearer view of the model.
The Relevance Of Pruning Today
With the rise of mobile inference and machine learning capabilities, pruning becomes more relevant than ever before. Lightweight algorithms are the need of the hour, as more and more applications find use with neural networks.
The most recent example of this comes in the form of Apple’s new products, which use neural networking to ensure a multitude of privacy and security features across products. Owing to the disruptive nature of the technology, it is easy to see its adoption by various companies.
The easy availability of neural networks is also required due to the varied nature of their applications. Their move to mobile is also complemented by the standalone computer in flagship devices, further creating a need for an efficient program that performs the most amount of work while consuming the least amount of resources.
This is the reason pruning is more relevant today, as the applications need to get lighter and faster without sacrificing accuracy.