Reinforcement Learning is a machine learning technique where an AI agent is taught how to perform certain actions via reward functions. Till now these rewards were designed by human researchers to suit the task. But now AI scientists at DeepMind are questioning the basics of reward function design which lies at the very base of reinforcement learning. Historically, it has been seen that the reward functions designed by humans often fall short and even create unsafe and irresponsible AI agents.
The AI Safety team At DeepMind has worked on long-term strategy for creating AI agents who can work in complex domains and learn to construct good reward functions for themselves. The ultimate aim, however, is to create safer AI agents in which the reward functions are improved by enabling the agent to learn great rewards by using surrounding information and even learn to optimise those reward functions.
But again, some alarm bells should go off. When AI agents are capable to decide their own reward functions, they have the potential to become smarter than humans.
AI And User Intention
Keeping the user intention in mind is really important as there are many outcomes that are undesirable. But many current reinforcement techniques do not take this into consideration. As the researchers also point out, “…Training a reinforcement learning agent on any real-world task, there are many outcomes that are so costly that the agent needs to avoid them altogether.”
Researchers say that they want to pose a question: “How can we create agents that behave in accordance with the user’s intentions?”. But in the same process, building a solution to this goal might require the AI agents to act independently of humans. The researchers break the research problem into two parts:
- Research along the lines of learning a reward function from the feedback of the user that captures their intentions,
- Research to train a policy with reinforcement learning to optimize the learned reward function.
Hence the researchers want to take the reinforcement learning agents from “what” to learn to “how” to learn?
The researchers see the need to ready AI agents to match user intention by teaching them how to sense the rewards from available data. They want to solve the agent alignment problem to fulfil the following three properties:
- Scalable: Alignment of AI agent becomes more important as ML performance increases. Hence the solution to the alignment problem that fails to scale together with our agents can only serve as a stopgap
- Economical: To completely remove incentives for the building of unaligned agents, training aligned agents should be low in cost. Even the performance compared to other approaches to training agents should be high
- Pragmatic. The solution offered should be practical as the researchers feel that every field has problems that remain unsolved even after our understanding has matured enough to solve many practical problems
Reward Remodelling And Challenges
The researchers want to centre their work around reward modelling. The user always trains a reward model or a function to learn their intentions by providing feedback. This reward model does the work of providing rewards to a reinforcement learning agent that interacts with the surrounding and both the processes happen concurrently. Hence we make sure we are training the agent with the user in the loop.
The researchers underline the challenges in the above-mentioned direction for the design of AI agents. They said, “The success of reward modelling relies heavily on the quality of the reward model. If the reward model only captures most aspects of the objective but not all of it, this can lead the agent to find undesirable degenerate solutions. In other words, the agent’s behaviour depends on the reward model in a way that is potentially very fragile.”