An Antarctic eelpout swims gracefully in cold dark depths without freezing its internal juices. It does this with the help of anti-freezing proteins (AFPs) which are by-products of million mutations over millions of years of evolution. The 3D structure of AFPs allows them to bind to ice crystals and prevent organisms from freezing by forming a hydrophobic layer that separates liquid from crystallising. This information of proteins is written in genes and encoded in DNA.
Many traits are inherent to an organism. Ducklings swim and the newly hatched chicks try to dodge predators. These species are precocial. They don’t need pre-training. They build upon these skills by adjusting to the environment and rarely suffer from a starting problem.
But when an attempt is made to mimic this naturally occurring intelligence through algorithms, many limitations surface. There are factors like artificially induced biases and the inability of the algorithms to capitalise on randomness.
In an attempt to address this, reinforcement learning has been introduced many years ago where the agent learns to perform tasks better through a reward system. The room for improvement in the case of reinforcement learning algorithms is obviously huge.
Acknowledging the shortcomings of the existing networks, in their paper titled, Weight Agnostic Neural Networks(WANNs), the researchers at Google Brain, dive deep to tweak the functioning of the networks at the foundational level.
In this work, the authors try to de-emphasize the importance of weights and how models can be made to learn better.
They recommend the following procedure:
- Create an initial population of minimal neural network topologies.
- Evaluate each network over multiple rollouts, with a different shared weight value assigned at each rollout.
- Rank networks according to their performance and complexity.
- Create a new population by varying the highest ranked network topologies, chosen probabilistically through tournament selection.
- The algorithm then repeats from (2), yielding weight agnostic topologies of gradually increasing complexity that perform better over successive generations.
WANNs have experimented on control tasks such as CartPoleSwingUp, Bipedal Walker and Car Racing.
For example, in the case of Bipedal Walker, the task is to guide a two-legged agent across randomly generated terrain. Rewards are awarded for distance travelled, with a cost for motor torque to encourage efficient movement. Each leg is controlled by a hip and knee joint in reaction to 24 inputs, including LIDAR sensors which detect the terrain and proprioceptive information such as the agent’s joint speeds.
Weight Agnostic Neural Networks(WANNs) were evaluated for performance based on 4 conditions:
- Random weights: individual weights drawn from range (-2,2)
- Random shared weight: a single shared weight drawn from range (-2,2)
- Tuned shared weight: the highest performing shared weight value in range.
- Tuned weights: individual weights tuned using population-based REINFORCE.
Computers, which are super-rational, are expected to show dedication towards a job without any prejudice. But the data provided is generated by humans, curated by humans and then there is this ambitious pursuit towards AGI(human-like intelligence). This is a walk on a tightrope for AI researchers who have to train the models for sophistication on a human level while being pragmatic about the flaws.
Here are a few outcomes of the WANNs:
- A serious attempt to develop networks that can be described with a minimal description length.
- WANNs which encode relationships between inputs are well suited to RL tasks: low-dimensional inputs coupled with internal states and environmental interaction allow discovery of reactive and adaptive controllers.
- Even in the high-dimensional classification task, WANNs perform remarkably well.
- This ability to quickly fine-tune weights is useful in few-shot learning.
- It is possible to learn a pruned network capable of performing additional tasks without learning weights
“We hope this work will encourage further research that facilitates the discovery of new architectures that not only possess inductive biases for practical domains but can also be trained with algorithms that may not require gradient computation,” wrote the authors.
Read more about the work here.