Intelligent systems need to incorporate a significant amount of understanding of the world in terms of intuitive physics. This understanding can be gained from data in various ways. Now, group of researchers from INRIA and Facebook AI team have developed an evaluation framework which diagnoses how much a given system understands about physics. This is done by testing the system’s computation plausibility of possible physical events versus the computation of plausibility of impossible physical events. The framework is free of bias and can test a range of specific physical reasoning skills.
The team describes the first release of a benchmark dataset aimed at learning intuitive physics in an unsupervised way. The team also says that two Deep Neural Network systems trained with a future frame prediction objective and tested on the possible versus impossible discrimination task. The analysis of the results compared to human data gave new insights in the potential and limitations of next frame prediction architecture.
Physical Understanding Of A Growing Child
Intelligent systems, however advanced they may be, still lack the human understanding to recognise complex scenes. It is a difficult task since to teach, since understanding the scene involves cognizance of the spatial and temporal relationship between objects in the said scene. According to years of research, this is how a human child’s understanding of physics evolves:
- At the age of two to four months, infants are able to parse visual inputs in terms of permanent, solid and spatiotemporally continuous objects
- At the age of six months, they understand the notion of stability, support and causality
- Between eight and 10 months, they grasp the notions of gravity, inertia, and conservation of momentum in collision
- Between the age of 10 and 12 months, they learn shape constancy
It is also clearly understood by the scientific community that the intuitive understanding of physics is latent, that is, it can only be observed and measured indirectly. This is in fact a grave challenge for both evaluation and engineering purposes, making this research particularly interesting.
Design Of The Evaluation And Engineering Challenge
According to the research the evaluation challenge was designed as follows:
“Given an artificial vision system, define a measure which quantifies how much this system understands about (intuitive) physics.”
Measuring intuitive physics can be done through real-worlds applications like Visual Question Answering (VQA), object tracking or action planning. But these tasks can be at risks from dataset bias and noisy measure which would make it very difficult for us to measure the performance.
That is why, as an alternative, the researchers propose the “physical plausibility test”, which claims to evaluate intuitive physics in a task-free and model-free fashion. The researchers posed the physical reasoning in a simple Yes/No classification problem. The depicted event is judged to be physically possible or not. This test is meant to be diagnostic in nature. Any system based on probability or reconstruction error can easily derive such a score.
The engineering challenge can be: Construct a system which incorporates as much intuitive physics as possible. First, an intelligent system could have a good physical understanding of a scene without performing full three dimensional reconstruction. Second, as shown by infants, it is possible to learn intuitive physics without being fed with any high level tag or label. Hence one of the ways to solve this challenge would be to build an unsupervised or weakly supervised system that learns the laws of physics using the same type of data available to infants.
A Diagnostic Test For Intuitive Physics
The main ideas of the diagnostic framework is inspired by the work done in developmental or comparative psychology. The goal is to build a well-controlled test, avoiding potential statistical biases to obtain relatively pure tests measuring different types of physical reasoning abilities. Intuitive physics is a knowledge base which can be incomplete, not totally coherent, and not used in all situations due to variations in attention or memory, among other factors. The researchers illustrate the diagnostic test on object permanence, one of the most basic principle of intuitive physics.
There are some design principles which are followed by the researchers when designing the framework :
- Minimal Sets Design: This is an important design principle of the evaluation framework which relates to the organisation of the possible and impossible movies in extremely well matched sets.
- Parametric Manipulation Of Task Complexity: The second design principle is that in each block, we will vary the stimulus complexity in a parametric fashion.
- The Physical Possibility Metrics: In this design principle, positive movies are more plausible than the negative movies within the set.
- A Hierarchy Of Intuitive Physics Problems: Taking advantage of behavioral work on intuitive studies, the researchers organise the tests into levels and blocks, each one corresponding to a core principle of intuitive physics, and each raising its particular machine vision challenge.
Datasets And Results
The training set has been constructed using Unreal Engine 4.0. It is composed of 15K videos of possible events (around 7 seconds each at 15 fps), which comes out to a total of 21 hours of videos. Each video is delivered as stacks of raw image (288 x 288 pixels), which comes out to a total of 157 GB of uncompressed data. The researchers also provide development and test sets for block O1 (object permanence). As for parametric complexity, we vary the number of objects (1,2 or 3), the presence and absence of occluder(s) and the complexity of the movement.
The researchers presented the 3,600 videos from the test set (Block O1) to human participants using Amazon Mechanical Turk. Participants were presented with eight examples of possible scenes from the training set, some simple, some more complex.