Embracing the open-ended is considered by many futurists and experts, to be the last grand frontier for the artificial intelligence to invade. The concept of the open-ended can be explained in terms of natural evolution. Just like the way a species that has survived over millennia, makes course correction under new environments, machines too will be made to learn about the problems which they haven’t been trained to do. Not only that, considering open-endedness at the algorithmic stage, will enable AI to generate the new problems as well.
Just like a species corrects itself with suitable strategies with millions of years of evolution open-ended computation systems will make systems more powerful. These are processes that produce not just a single positive result, but multiple unplanned advances rolling ahead in parallel without any final destination.
In this paper the researchers at Uber’s AI labs demonstrate POET (Paired Open-Ended Trailblazer) an open-ended algorithm, to tackle problems in a space of two-dimensional landscapes and a solution space of robotic behaviours that aim to traverse them.
POET is designed to facilitate an open-ended process of discovery within a single run. It maintains a population of environments.
POET has three main tasks that it performs at each iteration of its main loop:
- generating new environments E(·) from those currently active,
- optimizing paired agents within their respective environments, and
- attempting to transfer current agents θ from one environment to another.
These steps are to make sure that the curriculum that emerges from adding new environments is calibrated to the behaviour of agents. So, whenever an agent is exposed to a new environment, the corrections made will make the model versatile and robust.
The agent’s hull is supported by two legs. The hips and knees of each leg are controlled by two motor joints, creating an action space of four dimensions.
The domain of this experiment has been adopted from OpenAI’s “Bipedal Walker hardcore”. In this modified version, simulation is quicker, which enables the researchers to conduct experiments for longer duration in environments of a complex scenario.
The movement of the agent on this highly varying terrain is guided by the LIDAR rangefinders, whose measurements are included in the state variables such as the hull angle, hull angular velocity, horizontal and vertical speeds, positions of joints and joint angular velocities, and whether legs touch the ground.
The generated terrain consists of one or more types of obstacles. These can include stumps, gaps, and stairs on a surface with a variable degree of roughness and the agent’s only aim is to keep on moving without tumbling.
The reward function to keep the hull straight is as follows:
If the agent falls, the reward is – 100
The episode immediately terminates when the time limit (2,000 time steps) is reached, when the
agent falls, or when it completes the course.
All controllers in the experiments are implemented as neural networks with 3 fully-connected layers with than activation functions.
Check the full algorithm here.
What Does POET Have To Offer For The Future
In this domain, POET builds out a diverse collection of solved landscapes through experiments that modify previously-solved landscapes and apply and refine previously-discovered successful robotic behaviours. The=is makes it easy for the robot to traverse through rugged obstacle-filled landscapes that are not solvable when approached in isolation with standard optimization techniques. The success of POET hints at a step towards truly open-ended machine learning algorithms that continually invent and solve new challenges.
Implementing such algorithms will lead us to problems which were thought to be solvable or those problems we never knew existed. While the road remains long, the rewards for machine learning of beginning to capture the character of open-ended processes is potentially high. In addition, a diversity of such results can be generated in a single run, and the problems and solutions can both increase in complexity over time.