Variational Autoencoders (VAE) came into limelight when they were used to obtain state-of-the-art results in image recognition and reinforcement learning. VAEs consist of encoder and decoder network, the techniques of which are widely used in generative models.
Encoders can be seen in CNNs too, they convert an image into a smaller dense representation which is fed to the decoder network for reconstruction.
For example, one can see in the above illustration how the input image of the digit ‘2’ has been reconstructed using the autoencoder network.
If the output image looks a bit hazy than the input, it means that there has been a loss of information. This is given by the loss function of the network; cross-entropy between the output and input.
In image generation, even if the mean and standard deviation stays the same, the actual encoding will vary due to sampling. An autoencoder makes the encoder generate encodings to reconstruct its own input.
Now, what if these widely used VAEs are made to effortlessly switch between fully supervised, semi-supervised and unsupervised learning?
In short, training gets simple. The unlabelled aspect of data doesn’t seem to be a hurdle any more.
A group of researchers from Munich have introduced a new flavor of Variational Autoencoder (VAE) that interpolates between different supervised settings.
Making VAEs As A One Stop Solution
The new model is an extension of the original VAE which is depicted in. The only addition is that a classification layer π (typically a one-hot classifying layer using softmax activation) is introduced that is attached to the topmost encoder layer.
As can be observed in the picture above, the new model architecture is compared to its equivalents of supervised (left) and unsupervised (right). The π layer (and its loss) represents an extension to the standard VAE proposed in this.
The μ and σ layer encodes the mean and standard deviation of the gaussian prior in the latent layer.
The authors claim that the simplicity of their model allows turning any existent VAE into a semi-supervised VAE by simply adding the π layer and extending the loss function.
An advantage of the semi-supervised variant is that the decoder can be used as a generative model by providing both the target label and by sampling from the latent layer. Given that the prior distribution of the latent layer is Gaussian, the sample from a normal distribution can be fed as an input to the decoder.
In particular, all learned weights can directly be reused when transitioning into the semi-supervised learning scenario. This is very useful, as in many real-world applications, a labelled dataset (even partially labelled) is only built up over time and not available at project initiation.
Adding A Flavor Of Transfer Learning
The availability of unlabelled data points aids the model to form better representations in its deeper layers, hence enabling semi-supervised learning.
Maybe the opposite is true as well: Does the availability of labels also aid with finding better representations? Does it perform better on reconstruction related tasks such as anomaly detection?
This problem setup can be generally described as a flavour of ’transfer learning’: can the model improve its task related to unsupervised learning by leveraging the availability of labels that are primarily associated with the supervised learning task?
To investigate the above scenario, VAE was used as an anomaly detector. This is a classic case of feature engineering where the labels incorporate domain knowledge of some very specific, yet important, the property of the data set.
So, the idea here is that the π layer will guide the model towards an extractor for those very specific high-level features.
The term ’semi-unsupervised learning’ is a perfect description of this task – as semi-supervised learning enhances the performance of a supervised task by using unlabelled data, ’semi-unsupervised’ learning would enhance the performance of an unsupervised task by using labelled data.
- A new flavour of Variational Autoencoder (VAE) that enables semi-supervised learning.
- The model architecture requires only minimal modifications on any given purely unsupervised VAE.
- Applied this VAEs to the problem of anomaly detection, it is observed that its performance increases
- The model adapts seamlessly on the full 0-100% range of available labels.
Know more about this work here.