The pace in research around Reinforcement Learning (RL) has been growing seriously in the recent years. It’s no longer restricted to just the classic problem of robots getting punished or rewarded for actions and then rectifying them. It has moved beyond this context now.
Although the robot problem formed the basis for many self-learning applications, RL has taken on a new level. It has been used in virtual environments as well as in gaming for being formidable virtual opponents for gamers or players. Now, RL may even emulate human movements along with their natural behaviour as well.
We will discuss one particular research study called DeepMimic by academics at University of California, Berkeley, who have managed to simulate acrobatic movements precisely using RL methods.
Computer Graphics For Accurate Movement Visualisation
Xue Bin Peng, author of the paper for DeepMimic says that the inspiration for the acrobatic movement simulation project through RL, came from computer graphics which offer precise visualisation of real-world physics and the ability to model them. The possibility presented through animations can aid studies that have explored analysing human body movements for simulation.
But, the challenge lies in modelling physics-based models for simulation, across other applications. While there are many studies that have created models, there are setbacks with respect to optimisation or dynamics in movements or motion. Recent developments in motion simulation focus on bringing online models for easier representation. However, they fall short when it comes to richer dynamics and implementing more motions into the model.
This was what drove Peng and the team to provide a single simulation model capable of incorporating a large number of motions including acrobatics. The model also encapsulates RL policies efficiently. It will mean that RL and physics animations go hand in hand, which is a huge improvement.
DeepMimic – A Powerful Model For Motion Imitation
As mentioned earlier, building physics-based models which incorporate a lot of movement and actions, is quite challenging. Even if they are created with all the considerations, they may realistically fail to achieve significant results for ML. But with DeepMimic, this is not the case. Along with capturing unnatural acrobatic movements, this RL model considers a data-driven approach for these movements, says Peng.
“An alternative is to take a data-driven approach, where reference motion capture of humans provides examples of natural motions. The character can then be trained to produce more natural behaviours by imitating the reference motions. Imitating motion data in simulation has a long history in computer animation and has seen some recent demonstrations with deep RL. While the results do appear more natural, they are still far from being able to faithfully reproduce a wide variety of motions.”
In the context of standard RL, the policies are trained for each acrobatic movement through a motion imitation task. These movements are represented in the form of ‘target poses’, which are necessary for individual timestep actions. This makes it possible to project complex acrobatic movements smoothly in the model.
Characters And Tasks
DeepMimic has four character visualisations:
- 3D humanoid
- Atlas robot model
These are generated as rigid bodies with kinematic links of three degrees of freedom (DOF) except for the knees and elbows having one DOF. In addition, physical characteristics such as mass and height are also mentioned. All of these arrangements form the structure of bodily acrobatic movements. RL policies are trained for these character objects.
Apart from the characters, many tasks are delineated and assigned to these rigid body objects. The tasks classified by DeepMimic’s researchers are given below.
- Target Heading
- Terrain traversal
Based on these tasks’ categories, a total of 30 skills are designed for the simulation, and are trained in RL. Also, some of the skills are integrated to perform multi-skill actions — for example, movements like running, jumping and flipping motions are clubbed to get a unique action movement.
Training For Simulation
After finalising the adequate parameters such as policy states, actions rewards and the neural network to map all of these features, the model was subjected to training. Once policy and value functions are calculated and trained, the training process starts sequentially for each instance of the state of reference movements in a batch-wise fashion.
For imitating desired acrobatic motions, the policies in RL should capture every phase of the motion incrementally over time. This is done through initial state distribution, which helps the RL agent to capture the exact beginning of motion precisely. Similarly for cyclic motions such as backflips, frontflips etc., another strategy called ‘early termination’ is used. (Details of RL as well as training strategies can be found here).
The results after training show motions emulated accurately through RL. In addition, multiple movement integration also fares very well in visualisation. The physics aspects of DeepMimic model is where it makes a mark, thus presenting the possibilities of emulating a variety of movements. With more and more eccentric movements captured, RL can vastly improve self-learning areas that use motions and movements.