Self-preservation is an instinct that every human being has. Now imagine this, an AI that cannot process any emotions yet have the same instincts as humans and thinks to preserve itself over others! That is what researchers from Google-owned DeepMind firm study have found out in this major advancement by their artificial intelligence.
Google acquired DeepMind is on a scientific mission to push the boundaries of AI. It has been developing programs that can learn to solve any complex problem without any human intervention. DeepMind researchers recently ran a series of tests to find out how AI would react when faced with certain social dilemmas. The basic idea was to find out whether they would cooperate or compete.
For the research, they tested the AI on two games: a fruit-gathering game and a hunting game called Wolfpack. These are rudimentary two-dimensional basic games that used AI characters known as agents.
By using game theory, researchers found out that it is possible for artificial intelligence to act in an ‘aggressive manner’ when it feels that it is going to lose out. However, the AI agents will work together as a team when there is more benefit.
AI’s Killer Instinct
In the first fruit-gathering computer game, researcher tasked the two AI agents (red and blue) to gather as many virtual apples (green pixels) as possible. In the gathering game, the systems were trained using deep reinforcement learning to collect apples. When an AI agent collected an apple, it was rewarded with a ‘1’ and the apple vanished from the game’s map. Both AI agents had the option to tag their opponent two times in a row with a laser blast that would temporarily remove them from the game thus giving the attacker more time to collect the apple.
Through this test, researchers found out that the two versions of AI behave differently in accordance to the number of apples available. As long as there was enough apple to go around for both AI agents, there wasn’t any problem. But as soon as the apples became more scarce, the AI agents were more likely to knock each other out of the game to get ahead to collect the scarce apples for itself.
After running 40 million test, the researchers concluded that AI agents became ‘highly aggressive’ when there were scarcity of resources and it led to more ‘tagging’ behaviour of agents, which means agents with the capacity to implement more complex strategies try to tag the other agent more frequently, i.e. behave less cooperatively – no matter how we vary the scarcity of apples.
“Intuitively, a defecting policy in this game is one that is aggressive—i.e., involving frequent attempts to tag rival players to remove them from the game. Such a policy is motivated by the opportunity to take all the apples for oneself that arises after eliminating the other player,” the researchers write in their paper. The paper also stated that “Less aggressive policies emerge from learning in relatively abundant environments with less possibility for costly action. The greed motivation reflects the temptation to take out a rival and collect all the apples oneself.”
Watch the fruit-gathering game below:
AI’s Instinct With WolfPack
In the second game called Wolfpack, the researchers included three AI. Two of them are wolves (Red) and the third AI is prey (blue). Here the rules were different, this game required close coordination. When the prey was hunted down by either of the two wolves, they both received a reward. Greater rewards were offered when the wolves were in close proximity during a capture.
The researchers explained in their paper that the core idea is that, the prey is dangerous, a lone wolf can overcome it, but is at risk of losing the carcass to scavengers. However, when the two wolves capture the prey together, they can better protect the carcass from scavengers and hence receive a higher reward.
In this case, the AI agents worked together and it shows that AIs can recognise the benefits of cooperation that will have the best outcome for all.
Watch the Wolfpack game below:
What Do These Two Games Explain?
“In the Wolfpack game, learning a defecting lone-wolf policy is easier than learning a cooperative pack-hunting policy. This is because the former does not require actions to be conditioned on the presence of a partner within the capture radius. In the gathering game, the situation is reversed. Cooperative policies are easier to learn since they need only be concerned with apples and may not depend on the rival player’s actions.”
However, optimally efficient cooperative policies may still require such coordination to prevent situations where both players simultaneously move on the same apple. Cooperation and defection demand different levels of coordination for the two games. Wolfpack’s cooperative policy requires greater coordination than its defecting policy. Gathering’s defection policy requires greater coordination (to successfully aim at the rival player).” the DeepMind researchers explained in the paper.
The Wolfpack AI games have a clear message the potential benefit of artificial intelligence is huge. Such advancement in AI can certainly benefit humans as the corporation can be the key to greater individual success in certain situations. It could also lead to systems that can develop policies and real-world applications.
What Could The Aggressive Nature Of AI Imply?
As Professor Stephen Hawking rightly said, “We cannot predict what we might achieve when our own minds are amplified by AI. Perhaps with the tools of this new technological revolution, we will be able to undo some of the damage done to the natural world by the last one – industrialisation. And surely we will aim to finally eradicate disease and poverty. “Every aspect of our lives will be transformed. In short, success in creating AI could be the biggest event in the history of our civilisation.”
On the flip side, there are some challenges related to the data used by machine learning systems. The WEF paper highlights that even if the machine learning algorithms are trained on good data sets, their design or deployment could encode discrimination in ways like choosing the wrong model; building a model with inadvertently discriminatory features; absence of human oversight and involvement; unpredictable and inscrutable systems; or due to unchecked and intentional discrimination.
As artificial intelligence and machine learning are becoming more advanced day by day, it includes less human supervision and less transparency. It is important that humans are kept in a loop where factors are being unexpectedly overlooked. And we need to build human values in our machines because we have them, machines don’t and something like ‘be reverent, loving, brave, and true’ may be difficult to digitize. But one way or the other, it’s imperative that we find ways to infuse human values into our AI.
Try deep learning using MATLAB