With the publication of this paper in 2014, applications of GANs have witnessed a tremendous growth.
GANs have advanced to a point where they can pick up trivial expressions denoting significant human emotions.
Celebrated computer scientist and Turing award winner Yann Lecun observed, “GANs and the variations that are now being proposed is the most interesting idea in the last 10 years in ML, in my opinion.”
High Fidelity Image Generation With Fewer Labels
GANs are powerhouses of unsupervised machine learning. They are deployed to draw insights from the data, which is unstructured and is without any specific target value.
As machines depend on training data to produce results, labeled data enables finer tuning of the results.
Since, data is vast and usually unlabeled and the features are of uncertain correlations, the machine learning researchers have been building models with techniques which can be taught how to learn and then teach other models about the rules and ramifications of converging at a result.
GANs and its variants have been successful in generating high quality images like learning from blurry handwriting images and creating high quality handwritten digits or they look alikes.
To make the network learn feature representations, a technique called unsupervised semantic feature learning is used.
Semantic features describe the visual content of an image by correlating low level features such as colour, gradient orientation with the content of an image scene. For instance, correlate an extracted color such as green with the grass or blue with a swimming pool.
Unsupervised semantic feature learning trains the convolutional neural networks with the features of an image by tasking it with predicting the angle of rotation of the image.
The intuition is, for a model to perfect this prediction, it needs to be able to recognise the content of the images like their shapes etc.
The linear classifier is trained on the discriminator network’s feature representation to predict rotations of the rotated real images and rotated fake images. The corresponding difference in the angles predicted are added to the discriminator and generator losses.
One of the intermediate layers of this newly trained network is taken as new feature representation of the input. A classifier is trained so that it recognises the label of this new input feature. Since the network has gained few skills from its rotation prediction previously, the classifier need not be trained over entire network.
In order to maintain the consistency in the quality of the images that are generated, Frechet Inception Distance(FID) is used to measure the quality. Lower the FID, the better the quality. In other words the similarity between real and generated images is close.
FID compares the statistics of generated samples to real samples, instead of evaluating generated samples in a vacuum.
The above illustrates how the network produces an image by performing interpolation on the latent(hidden features) vectors of the leftmost and rightmost images.
The above techniques show a significant improvement in the results of conditional GANs even with large scale training.
- The hand annotated ground truth labels in an image are replaced with inferred ones.
- To learn the feature representation from unlabeled dataset, self supervision is introduced where the network is trained by tasking it with predicting angle of rotation of the image
- Finally label recognition based on the activation patterns of the intermediate layers of the network trained on above tasks
- Maintaining image quality with Frechet inception Distance(FID)
The success of this model will encourage more research into self and semi-supervised machine learning algorithms.
Read more about this work here.
Check the open-sourced GAN library here.