Machine Learning models suffer from a lack of structured data. The representations of the data are sometimes entangled. This is where prior knowledge of humans comes in handy to perform feature engineering. In order to make the algorithms to exploit the inherent representations within data, disentanglement methods were introduced. There is no single definition of disentanglement so far but it can be summarised as follows:
Disentanglement is the act of breaking down each feature into variables that is similar to reasoning at the human level. The whole objective here is to extract useful information when building classifiers. The commonly held notion about unsupervised learning of Disentangled representations is that real-world data is generated can be recovered by unsupervised learning algorithms.
In this paper, the authors challenge this notion by theoretically showing that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data.
Inductive biases are the assumptions made by an algorithm to learn the target function. For example, nearest neighbors assume that most of the cases in a small neighborhood in feature space belong to the same class and in linear regression model, the assumption is that the output or dependent variable is related to the independent variable linearly.
These are inductive biases that are prevalent widely in machine learning models. And, the authors of this paper, prove the impossibility of unsupervised learning without these inductive biases.
Does Disentanglement Really Declutter?
State-of-the-art approaches for unsupervised disentanglement learning are largely based on Variational Autoencoders
In generative models, where the task is to generate images from the learning of the training set, disentanglement is supposed to make things easier. But, does this really show any significance in an unsupervised setting?
To fairly evaluate the different approaches, regularization is separated from the other inductive biases
Each method uses the same convolutional architecture, optimizer, hyperparameters of the optimizer and batch size. All methods use a Gaussian encoder where the mean and the log variance of each latent factor is parametrized by the deep neural network, a Bernoulli decoder and latent dimension fixed to 10.
The goal of any disentangled representation is to build models that capture explanatory factors in a vector. The figure below is a representation of a 10-dimensional vector. . And, each panel visualizes the information captured in one of the 10 different coordinates.
The ground-truth factors wall and floor color as well as rotation of the camera are disentangled (see top right, top center and bottom center panels), while the ground-truth factors object shape, size and color are entangled (see top left and the two bottom left images)
To check whether disentangled models can be identified without supervision, the researchers have trained more than 12,000 models covering widely used methods and for evaluating the models, a reproducible large-scale experimental study was done on seven different data sets.
And, since the results of this investigation do not show any compelling evidence of disentanglement increasing sample efficiency of the downstream tasks, the authors recommend concrete demonstration before enforcing any disentanglement method.
The contribution of the authors in this paper are as follows:
- Theoretically proving the impossibility in unsupervised learning of disentangled representations without inductive biases.
- Release of the library disentanglement_lib to train and evaluate the disentangled representations along with 10,000 trained models for future research.
- Good trained models cannot be identified without access to ground-truth labels even if we are allowed to transfer good hyperparameter values across data set..
The authors conclude by saying that though different methods successfully enforce properties driven by the corresponding losses,well-disentangled models cannot be identified without supervision.
This paper, which was given the top honors at the recently concluded ICML 2019, brings to light the significance of the hyperparameters over model selection and how crucial the role of supervision is while debunking widely held notions of disentanglement.
Read more about this work here