When it comes to performing few-shot learning tasks, deep neural nets fail to live up to their expectation. Because in a few-shot learning task, the classifier has to make generalisation after every few examples from each class.
Meta-learning came into light when its techniques were put in to use for optimisation of hyperparameters, neural networks and reinforcement learning.
Usually, gradient-based optimization in high capacity classifiers requires many iterative steps over many examples to perform well.
The meta-learner is modelled as an LSTM, because of the similarity between the gradient-based update in backpropagation and the cell-state update in LSTM.
Solving Industry-level Problems With Meta-learning
Meta-learning is used to teach the machine to learn how to learn. So that the model learns new skills and quickly adapts to the changing environments with finite training precedents. The main objective of this approach is to find model agnostic solutions.
Following modifications can be made for fast learning:
- Sample a subset of labels, L
- Sample a support set, SL ⊂ D and a training batch, BL ⊂ D. They contain data points with labels belonging to the sampled label set L
- The support set is part of the input model
- The final optimization uses the mini-batch BL to compute the losses and update the model parameters through backpropagation, in the same way we use it during supervised learning.
So, here (SL, BL) are treated as one data point. Symbols in red are added for meta-learning in addition to the supervised learning objective:
A meta-learner is the AI model on a higher level. This model infers the values predicted from other lower level AI models. Lower level models like those that are built on image classification, reinforcement learning tasks, etc.
Whenever the meta-learner gets a prediction right, it rewards itself and gets penalised by the actual value when wrong.
This helps in optimizing the low-level AI model’s architecture, hyperparameters, and dataset tuning.
The above figure illustrates training of a simple model and a task-agnostic algorithm, Model-Agnostic Meta-Learning (MAML). The training of the model’s parameters is done in such a way so that a small number of gradient updates will lead to faster learning on a new task.
Model Agnostic Meta-Learning optimizes for a set of parameters such that when a gradient step is taken for a specific task say i, the parameters are close to the optimal parameters θ*(i) for task i.
The training process mimics what happens during the test since it has been proven to be beneficial in Matching Networks. During each training epoch, first a dataset is sampled and then mini-batches are sampled for updates. The final state of this learner parameter is then used to train the meta-learner on the test data.
Key Ideas :
- As the meta-learner is modeling parameters of another neural network, it would have many variables to learn by sharing parameters across coordinates.
- The meta-learner assumes that the loss and the gradient are independent for simplicity.
- Meta-loss is the sum of all losses computed during the training of various lower-level models. This loss can be calculated using an optimizer like Stochastic Gradient Descent (SGD) or Advanced Data mining And Machine Learning (ADAM).