In the early 1960s, AI pioneer Herbert Simon observed that in a span of two decades, machines will match the cognitive abilities of humankind. Predictions like these motivated theorists, sceptics and thinkers from a cross-section of domains to find ways to use computers to perform routine tasks. From Heron’s Automatons in the first century to Google’s Deep Mind in the 21st century, there has been an undying pursuit to create human like intelligence external to humans.
AI is now able to make art that can get hefty price at auctions. It can help e-commerce industry with recommendation(pixel level domain transfer), medical anomaly detection, music generation and popularly, face generation of people who never existed.
One thing that underlies the successful commercialisation of AI in all the above case is the use of Generative Adversarial Networks (GANs).
With the publication of this paper in 2014, applications of GANs have witnessed a tremendous growth. Generative-Adversarial Networks(GANs) have been successfully used for high-fidelity natural image synthesis, improving learned image compression and data augmentation tasks.
GANs have advanced to a point where they can pick up trivial expressions denoting significant human emotions. They have become the powerhouses of unsupervised machine learning.
The latest developments in AI, especially in the applications of Generative Adversarial Networks (GANs), can help researchers tackle the final frontier for replicating human intelligence. With a new paper being released every week, GANs are proving to be a front-runner for achieving the ultimate — AGI.
This article is an attempt to familiarise the reader with the jargon surrounding GANs and to give a high-level view of its functioning.
The Working Principle
GANs are generative models devised by Goodfellow et al. in 2014. They work on the principle of generating and discriminating the inputs. The two networks, Generator and Discriminator go toe to toe with each other like arch-nemesis; benefitting the overall model eventually.
The Generator (G) is responsible to produce a rich, high dimensional vector attempting to replicate a given data generation process; the Discriminator(D) acts to separate the input created by the Generator and of the real/observed data generation process. They are trained jointly, with G benefiting from D incapability to recognise true from generated data, whilst D loss is minimized when it is able to classify correctly inputs coming from G as fake and the dataset as true.
The Generator network has 4 convolutional layers, all followed by BatchNorm (except for the output layer) and Rectified Linear unit (ReLU) activations.
Rectified Linear Unit or ReLU is now one of the most widely used activation functions. The function operates on max(0,x), which means that anything less than zero will be returned as 0 and linear with the slope of 1 when the value is greater than 0.
The network takes as an input drawn from a normal distribution and is fed through the consecutive layers. Each of these layers represent a convolution operation.
The discriminator is also a 4 layer CNN with BatchNorm (except its input layer) and leaky ReLU activations.
The drawback with ReLU function is their fragility, that is, when a large gradient is made to flow through ReLU neuron, it can render the neuron useless and make it unable to fire on any other datapoint again for the rest of the process. In order to address this problem, leaky ReLU were introduced. So, unlike in ReLU when anything less than zero is returned as zero, leaky version instead has a small negative slope, which is crucial for the functioning of a Discriminator network.
Half of the time the Discriminator network receives images from the training set and the other half from the generator.
The Discriminator has to output probabilities close to 1 for real images and near 0 for fake images. To do that, the discriminator takes sum of two partial losses- One for maximizing the probabilities for the real images and another for minimizing the probability of fake images.
As training progresses, the generator starts to output images that look closer to the images from the training set. That happens because the generator trains to learn the data distribution that composes the training set images.
At the same time, the discriminator starts to get real good at classifying samples as real or fake.
The whole concept of GANs can be summarised as, “the generator is trying to fool the discriminator while the discriminator is trying to not get fooled by the generator. As the models train through alternating optimization, both methods are improved until a point where the counterfeits are indistinguishable from the genuine ones.”