Deep Reinforcement Learning Agent Beats Atari Games

Stanford researchers developed the first deep reinforcement learning agent that learns to beat Atari games with the aid of natural language instructions.

“Humans do not typically learn to interact with the world in a vacuum, devoid of interaction with others, nor do we live in the stateless, single-example world of supervised learning,” mentioned the researchers in their paper on how truly intelligent artificial agent will need to be capable of learning from and following instructions given by humans. “In our learning, we benefit from the guidance of others, receiving arbitrarily high-level instruction in natural language–and learning to fill in the gaps between those instructions–as we navigate a world with varying sources of reward, both intrinsic and extrinsic.”

Using CUDA, TITAN X Pascal GPUs and cuDNN to train their deep learning frameworks, the researchers combined techniques from natural language processing and deep reinforcement learning in two stages. In the first stage, the agent learns the meaning of English commands and how they map onto observations of game state. In the second stage, the agent explores the environment, progressing through the commands it has learned to understand and learning what actions are required to satisfy a given command.  Intuitively, the first step corresponds to agreeing upon terms with the human providing instruction. The second step corresponds to learning to best fill in the implementation of those instructions.

Here’s a video of their best current model that achieved 3,500 points.

Left: An agent exploring the first room of MONTEZUMA’S REVENGE. Right: An example of the list of natural language instructions one might give the agent. The agent grants itself an additional reward after completing the current instruction. “Completion” is learned by training a generalized multimodal embedding between game images and text.

The researchers include that this approach can be applied to robotics where intelligent robots can be instructed by any human to quickly learn new tasks.

Read more >