Seeing is believing, but to actually justify those beliefs, one needs to interact with one’s surroundings. Humans heavily rely on solving tasks using abstract reasoning and creativity even when the available information is relatively sparse. This human capability of conceptualising abstract real-world experiences and converting them into logic is the foundation for understanding and reasoning.
Concept Of A Concept
Energy models learn concepts in a 2D environment and train themselves to be able to perform similar tasks in a 3D environment — for example, navigating a robotic arm. Concepts are abstractions that we derive from everyday interactions with the world. They form the reusable source of knowledge that is essential to humans for making sense of the world.
The concepts in this context can be visual, positional and temporal, among others. These concepts form building blocks for the model’s understanding and reasoning to perform a task.
Conceptual representations arise from the integration of perception and action. Definitions like these have been elaborated in the paper on sensorimotor contingencies, released by Vicarious AI earlier this year.
Similarly, DeepMind’s SCAN model tries to emulate a newly born baby’s visual system. SCAN employs a hierarchical learning model of learning from visual primitives to learning the meaning of a new concept by building an abstraction over the learnings from the previous stage. The concept of a square can be specified as equidistant edges and other spatial positioning parameters whereas the colour of the square can be termed irrelevant by the model. In this way, the model learns a concept by picking up entities of relevance.
Energy Model As An Energy Function
Energy-based Models (EBMs) were first introduced in 2006 by Yann LeCun and his team. The study suggests associating each variable with energy value for further evaluation. In this paper, released by OpenAI, researcher Igor Mordatch shows that the energy models, can both generate or identify the concepts. The picture below shows how it generates and identifies properties of a square.
For instance, a robotic arm runs on these energy models can learn how to deal with available actions say, changing the torque or moving to a new spatial position.
The idea behind energy models is derived from root concepts of physics where the observed events are considered as low-energy configurations.
Mathematically, energy functions are defined using three variables; observed state(x), attention mask(a) and a weighted vector(w) for determining concept corresponding to its energy.
The entities of the world for observed states are defined based on properties such as dots, which have both positional and visual (colour) properties.
While attention masks are used to identify a model’s prioritisation of certain entities. For example, a concept is considered when the energy is low and vice versa. A concept, be it visual, positional or temporal (delayed or advanced), is satisfied when an attention mask focuses on a set of entities(properties) that represent a concept.
This energy model is built upon an energy function which in turn is a neural network where these random entities are inputs. Optimising the parameters of this energy function enables the model to formulate other functions.
This training makes the model good at both generation and identification tasks which further accommodates their cross-application internally for performance enhancement. In a way, mimicking the mirror neurons in animals. Mirror neurons appear to form a cortical system matching observation and execution of goal-related motor actions. These neurons respond both when a particular action is performed by the animal and when the same action, performed by another individual, is observed.
Training The Model
The idea here is to optimise the energy function in such a way so that the predicted attention masks are assigned low energy values. Because low energy values mean that a concept has been identified or satisfied.
These values are learned by the same incentivisation techniques that generative models with frameworks like variational autoencoders employ. VAEs are trained with stochastic gradient descent and are built on top of neural networks and perform unsupervised learning of complicated data distribution. For instance, when an image is fed into a generative model, then it captures the dependencies between pixels (similar colour intensities etc.) and segregating images with random noise by appending them low probability scores.
In our case, energy models segregate concepts based on their energy values corresponding to the attention masks.
The results indicate that transfer learning helps the model perform tasks like concept generation even without being trained to do so.
The cross transfer of learnings while training improved the model’s performance compared to those which followed task-specific training.
Here we see an illustration of how a robotic arm which runs on energy based model quickly(5 demonstrations) learns to observe the 2D state of the world, identify the concepts and navigate between points by manipulating the torque applied and by calculating the spatial positioning.