Generative Adversarial Networks (GANs) have found prominence over the last few years. From deep fakes to generating faces of people that don’t exist, GANs have been deployed for quite unpopular yet alarming applications.
The fundamental nature of these dual networks is to outplay each other. One generates images to fool the other while the other tries not to be fooled. Given enough time, the network becomes so good that it ends up making fake images as realistic as possible.
However, this is only the infamous aspect of GANs. The potential of GANs was already seen at the Sotheby’s auction last year when the painting titled Edmond de Belamy, from La Famille de Belamy was sold for a whopping $432,500 and it now hangs opposite the works of pop art geniuses like Andy Warhol.
Celebrated computer scientist and Turing award winner Yann Lecun observed, “GANs and the variations that are now being proposed is the most interesting idea in the last 10 years in ML, in my opinion.”
One variant of GAN, conditional GANs (cGAN) has been used to fine tune trading strategies. The potential is unbound and undiscovered. So, it is extremely crucial to monitor the performance of GANs. Here are a few metrics that can be used to validate GANs:
Frechet Inception Distance(FID)
In order to maintain consistency in the quality of the images that are generated, Frechet Inception Distance(FID) is used. Lower the FID, the better the quality. In other words, the similarity between real and generated images is close. FID compares the statistics of generated samples to real samples, instead of evaluating generated samples in a vacuum.
Annealed Importance Sampling
Since comparing models by inspecting samples is labour-intensive, and potentially misleading, Annealed Importance Sampling was developed. In this approach, the log-likelihood for decoder-based models are evaluated and the accuracy is validated using bidirectional Monte Carlo.
Geometry Score
In this method, the problem of estimating the quality and diversity of the generated images is tested by going through the topology of the underlying manifold of generated samples may be different from the topology of the original data manifold, which provides insight into properties of GANs and can be used for hyperparameter tuning.
Contrary to methods like Inception Score and FID, this topological approach does not use auxiliary networks and is not limited to visual data.
Based on the probabilistic understanding given two datasets X1 and X2 Geometry score is given by:
Where MRLT is Mean Relative Living Times (MRLT)
Tournament Based Method
Tournament Based method was Introduced by the researchers at Google Brain. In this approach, a tournament is conducted where a single model is rated by playing against past and future versions of itself. This helps in monitoring the training process of GANs. And these measurements are classified into two ratings: win rate and skill rating.
The tournament win rate denotes the average rate at which a generator network fools the discriminator network.
Whereas, a skill rating system, as its name suggests gives a skill rating for each generator.
Discriminator Rejection Sampling
To rectify the errors surfacing in GAN generator distribution, a rejection sampling based method was introduced. The idea behind this method is to improve the quality of trained generators by post-processing their samples using information from the trained discriminator.
Precision And Recall
Though metrics like Fréchet Inception Distance (FID) are popular with the evaluation of GANs, they are unable to distinguish between different failure cases owing to their one-dimensional scores. This is where traditional Precision and Recall might prove to be useful.
Know more about GAN training here.