Artificial intelligence is one such field which can use interdisciplinary resources to improve and at the same time pay dividends across domains. So far, the mathematicians, biologists and neuroscientists have made the majority of contributions to make machine learning algorithms more robust, at least at the software level. No matter how large the data is or how complicated the equations are, to the hardware and the circuitry, these are just a bunch of electrical pulses.
However, the shortcomings on the hardware end had attracted material scientists to develop devices that occupy less space for carrying out the same number of operations and sometimes even faster without losing the accuracy.
Non-volatile Memory is one such innovation which is capable of saving data even after the power is cut down. These are classified as mechanically and electrically addressed NVMs based on the method in which the data is written — just as the name suggests.
NVMs find their applications in the form of optical disks, hard drives, USB memory sticks, digital cameras and many more.
However, it comes with its own set of issues, like:
- Limited endurance
- Limited dynamic range
- Imperfect yield
- Resistance drift
- Non-linearity and variability between cycles
- Inconsistency in achieving maximum conductance
- The asymmetry between weight changes
A Mix Of Hardware And Software Comes In Handy
In this paper, the researchers introduce Novel DNN accelerators based on crossbar arrays of non-volatile memories (NVMs) that can perform faster using less energy. This is an effort to mix hardware and software capabilities to speed up the systems.
In such systems, computation occurs in the analogue domain at the location of weight data, encoded into the conductances of the NVM devices.
By encoding each weight into four distinct physical devices — a “Most Significant Conductance” pair (MSP) and a “Least Significant Conductance” pair (LSP), DNNs can be trained to software-equivalent accuracy despite the imperfections of real analogue memory devices.
Achieving such high “test” accuracies requires both large networks and large amounts of training data, training times can range from days to weeks on real-world problems, even with cutting-edge GPU-based hardware.
One way to accelerate this training process, particularly for fully-connected layers, is by using dense crossbar arrays of Non-Volatile Memory (NVM).
Signed weights can be encoded by either a pair of conductances or a programmable conductance and an intermediate reference current.
How Was This Made Possible
The computations in any deep neural network are basically accumulated results of a dot product across the matrix. This characteristic of DNN allows the computations to be implemented directly on crossbar arrays of NVMs in a parallel fashion at the location of the data.
In this study, Giorgio Cristiano and his colleagues at IBM Research AI, propose a new stratagem in the form of multiple conductances which can go toe to toe with the accuracies simulated on TensorFlow.
A fully connected network with 4 neuron layers is mapped onto 3 different crossbar arrays as illustrated above where G indicates the conductance and the sign indicates the difference in conductances between the pair of devices elements. And, the overall weight is the sum of these differences.
The weight ‘W’ is encoded in two pairs of devices. We can see four resistive devices in the above figure. These devices are used to encode synaptic weights. The difference in conductances of ‘G’ is considered as a most significant pair (MSP) and that of ‘g’ as a least significant pair (LSP).
First, the target weight ‘W’ is copied into the peripheral circuit and then the updated weight is checked with the preserved target weight for tuning purposes. During training, all weight updates are applied to the LSP, and the MSP is unchanged. If a need arises to change the weight then a large number of positive update pulses and an almost equal number of negative update pulses are applied.
This necessitates conductance changes that are linear, independent and offer LSPs conductances to cancel out multiple weight change requests.
To achieve this, the researchers at IBM use a 3-Transistor-1-Capacitor LSP structure which will eventually pave way for implementation of NVM devices. The two programming transistors can be briefly configured as a transmission gate in order to rapidly charge all the capacitors to the desired target voltage.
High-speed processing at this level was achieved with Open-Loop Tuning(OLT); a technique which follows “blind” write-without-verify procedure.
This study acknowledges the challenges involved with mixing up hardware and software at scale. And, it tries to bridge the gap between algorithmic acceleration at the hardware level and the intricacies involved in the deep learning network itself.
- NVM device modelling for robust weight updating high-speed preservation.
- How to use individual conductance combination to update weights.
- The features and requirements for using LSP and MSP
- The significance of Open and Closed Loop Tuning and their design choices.
- Impact of single device programming.
- Challenges with open loop tuning.
Next generation of AI technologies should be able to comprehend our commands by working out on the huge background of information in a fast paced environment. To make these machines smart, we need to enhance the capabilities on the hardware side as well to make them more energy efficient. Using NVMs to train fully connected networks is one such attempt that will encourage more hardware enhancements in the future and rightly so.