The International, which is the FIFA of Dota 2, a complex battle arena game, had an artificial intelligence system compete with professional players in the 2018 tournament. Earlier this August, an AI player called Five, created by OpenAI, failed to defeat professional human gamers. Despite having the training and “experience” of over 180 years, the AI was unable to achieve the feat. Why was it so?
To give a brief to the uninitiated, Dota 2 is a popular online multiplayer video game which has 115 heroes, categorised according to strength, agility and intelligence. There are two teams of five players each and every team player has to pick a hero, which has different powers and characteristics, and destroy the opposite team’s base while encountering a lot of hurdles.
Tech Behind Five
Each of the five heroes of Five were trained with a neural network. They were trained for a gameplay worth of 180 years, for two months before the final match. Every neural network was trained by playing against itself. Learning from self-play provides a way for natural exploration of the game environment. During training, properties like health, speed or starting level, were randomised.
At the beginning of each game, each hero was randomly assigned some set of lanes to follow and was not allowed to distract from these lanes. At first, Five players walked aimlessly inside the game, but after some hours of training, they could do things like farming and fighting.
After some days, the Five players could think and play like humans by making strategies and performing actions such as stealing the opponent’s Bounty runes and walking to their tier one towers to farm. Gradually they became proficient in advanced tactics, like the 5-hero push. It was found that when the randomisations were increased the human player teams started to lose games. 80% of the games were trained against itself and the other 20% against its past selves. This was done to avoid any strategy collapse.
The system was implemented as a general-purpose OpenAI Five’s learning algorithm named Rapid, which can be applied to any Gym environment. An advanced method based on policy gradient methods called proximal policy optimisation (PPO) was used to make decisions
OpenAI used a separate long short term memory (LSTM) networks, a kind of recurrent neural network, for each hero to learn strategies. Each of the neural networks of Five has a single layer, 2024-unit LSTM that observes the current game state from the Bot API. It then eventually gives actions based on it via several action heads. Each head has a distinct action and is computed independently.
To train the AI to play a game as real-time and complex as Dota 2, it had to be put in a very powerful processing capability. It has 256 P100 GPUs on GCP and 128,000 preemptible CPU cores on its CGP. Observations were 7.5 per second of the gameplay and the size of observation was 36.8 kilobytes. Batch per minute was 60 and the batch size was 1048576 observations.
5 Observations Where Five Went Wrong
Unity: Five always seemed to stay in unity, even when it wasn’t required. This was beneficial to them when it was a good time to attack, but not favourable when the opponent took the advantage of it and tried to defeat them all together. It did not probably have the ability to realise that the opponent player heroes are not as same as they themselves and they could not decide what opponent powers could make them use as their strengths and weaknesses. It only took actions according to their own team heroes and so it wasn’t effective in a game where the opponent could be any hero of a hundred and fifteen.
Missing couriers: They did not worry much about the courier in the game and kept playing in the battlefield despite the courier being present. They could not grasp that the courier is more important than fighting in the battlefield for the team survival.
Speed: Five, being a machine, naturally had a faster response time than the professional players. So, they had fast decision-making abilities and could react faster in the gameplay. They didn’t have to keep checking on the map where their team was or check if their most powerful spell is ready. The usual human response time is around 150 to 500 milliseconds. Whereas, Five had a response time of about 80 milliseconds.
Poor decisions: Although the decision-making skills were very fast, there were instances when the decision made was extremely poor. Five could not make optimal decisions to all the situations. For example, staying in groups all the time.
Fearless: Five repeatedly sacrificed their top lane or bottom lane, with an intention of having a control over the opponent team’s safe lane. The instant they saw a kill, they went for it without gauging the consequences; without considering the enemy’s powers and what disadvantages might going near it and killing it have.
The failure of OpenAI Five was not really a failure of AI. It showed that it could play something as complex as Dota 2. Dota 2 reflects many real-world environments. Games like this is a perfect testbed for AI research. OpenAI is one of the biggest organisations that are focused on solving humanity problems with AI.
Humans are in turn learning new techniques from their matches with bots. For example, professional Go player Lee Sedol, was defeated by DeepMind’s AlphaGo, but it taught him a new technique in the game. DotA’s example would when Five allowed players to recharge a certain weapon quickly by staying out of range of the enemy. This was new and the human players learnt from it.
Therefore, AI gives an opportunity to learn for both the parties — a win-win situation.