Ever since Demis Hassabis’s talk on Learning from First Principles at NIPS 2017 held last December, the internet is abuzz whether the Google-owned, London-headquartered DeepMind is close to solving the General Intelligence puzzle which have the company has been pursuing since it was founded in 2010. The company behind Atari and Go AI systems reached another milestone in its journey with AlphaGo Zero which evidently learns from scratch, requires no bootstrapping from human data and learns incrementally from its own mistakes.
A recent paper published on December 5, 2017 Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm talks about the accomplishments of AlphaGo Zero program that achieved superhuman performance in the game of Go, by reinforcement learning from games of self-play. The paper generalized the approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.
Is AlphaGo Zero a scientific breakthrough – step towards AGI?
Interestingly, Tesla’s Director of AI and Autopilot Vision Andrej Karpathy thinks progress towards AGI is a step function and will happen either suddenly or unexpectedly, he shared at the recent Fireside Talk at NIPS 2017. “If you look at the progress of AlphaGo Zero, there was a long period of quiet engineering effort and algorithmic advances, then reached superhuman status in just three days,” shared Karpathy, when quizzed about his views on AGI.
A section of ML researchers dubbed AlphaGo Zero as a significant research advancement in AI, even more important than Alpha Go since Alpha Go Zero learned without any data. Now, Deep Learning systems are panned for their data dependence and brute force nature of achieving optimum results.
However, AlphaGo, on the other hand does not rely on human generated data and may discover new knowledge, which can result in artificial general intelligence (AGI). The end goal of any AI company is to build models that discover new knowledge on their own. Such models do not rely too much on human generated data to start with.
There were two main ideas that AlphaGo clearly emphasized, noted Demis Hassabis during his talk:
- Intuition: Implicit knowledge acquired through experience but not consciously expressible. The quality of this knowledge can be verified behaviourally
- Creativity: The ability to synthesize knowledge to produce a novel or original idea, AlphaGo clearly demonstrated these abilities, although in a constrained domain
AlphaGo Zero used a Deep Reinforcement Learning approach and a tree-based search strategy. Here are some of the differences from previous approach:
- The architecture used one deep neural network (DNN) instead of the two separate policy and value networks.
- There was a simpler tree search strategy
- No bootstrapping from human data
- Fully automated pipeline with no human in the loop
- Zero human generated data for training and learnt incrementally from its own mistakes
- It started from completely random play with zero knowledge and played against itself millions of times
- So Alphago Zero plays games against itself at current full strength
Here’s what makes AlphaGo Zero really General
Besides the human-like ingenuity displayed in learning, the reason why AlphaGo Zero is a big step forward is because the system got rid of the supervision and feature engineering. The next logical step was to replace MCTS with a differentiable recurrent model to build an end-to-end trainable system that doesn’t utilize simulations. This step made the system truly general.
According to experts, the high-level of engineering, self-play factor known as co-evolution pushed the model to a superhuman level. Nevertheless, it is a generalized learning network that still has to be trained to specialize in certain field.
Role of Deep Reinforcement Learning in achieving Strong AI
When it comes to building machines that think and learn like humans, Deep Reinforcement Learning is perhaps viewed as the most plausible paths. Today, Deep Reinforcement Learning is one of the most active research areas in artificial intelligence – essentially it is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives while interacting with a complex, uncertain environment, explains Dr Richard Sutton, known as one of the founding fathers of computational reinforcement learning.
At NIPS 2017, Deep Reinforcement Learning was the most popular topic and DeepMind has delivered great results with AlphaGo Zero which plays at superhuman level. During the recently concluded NIPS the DeepMnd released an updated version which plays Go, Shogui, and Chess at a dominant level. Another example of Deep RL is Deep Q-learning — a model-free reinforcement learning algorithm used to train deep neural networks on controlled tasks such as playing Atari games. A network is trained to approximate the optimal action-value function Q(s, a), which is the expected long-term cumulative reward of taking action a in state s and then optimally selecting future actions, explains the paper Building Machines That Learn and Think Like People.
According to Hassabis, co-founder and CEO of DeepMind, grounded learning in intuitive theories of physics and psychology could significantly support the training and help generalize knowledge to new tasks and situations. so, instead of building systems that simply recognize handwritten characters and play Go or Atari, deep learning should tackle tasks with little training data and also evaluate models on a range of human-like generalizations. In his talk, Learning from First Principles NIPS, Hassabis outlined his strategy to build intelligent systems:
a) Learning vs Handcrafted: we want systems to learn themselves directly rather than being spooned and handcrafted with solutions programmed
b) General vs Specific: we wanted a system that was able to run across a wide range of environments and tasks and not just do one problem or one task.
c) Grounded vs Logic based systems: true thinking system or a cognitive system should be fully grounded in reality
d) Active vs Passive: Notion of active vs passive is that we strive to create agent-based systems that are active participants in their own learning