Watch Google’s A.I. Figure Out ‘Montezuma’s Revenge’ In Four Tries

Is there any '80s videogame that machines can't win?

What separates humans from artificial intelligence — for now — is our ability to learn quickly with just a few examples. We can see a dog once and determine if most other animals we see are dogs or not. Computers, our binary friends, aren’t so easily adaptable. It usually takes millions of examples to teach a computer to recognize a cat or understand language. That’s why now, researchers are investing a lot of time to make machines that learn faster and from fewer examples.

The latest research from Google’s artificial intelligence-focused DeepMind division explores a new way to build artificial curiosity—incentivizing the A.I. to learn by making it want to win the game. And the algorithm is playing the game just like humans would, by looking at the screen and making decisions based on what’s happening on the screen. And the algorithm gets a digital reward for exploring more of the game.

Pitting their new model against one without the same “curiosity,” the new A.I. explored 15 rooms of a total 24, while the old explored just 2 rooms.

A comparison of Google DeepMind's exploration with and without the reward system.
A comparison of Google DeepMind’s exploration with and without the reward system. Google

The A.I. also learns extremely quickly. In the video at the top of this page, it takes only four tries to clear the first room of Montezuma’s Revenge, an Atari 2600 game. Just a year ago, DeepMind’s previous Atari agent, Deep Q, couldn’t even score a point in the game. “Poor old Deep Q scored a big fat zero,” Wired wrote.

Games like Montezuma’s Revenge pose a particular challenge because they require more than just reaction, like Pong or Breakout. To succeed, players need to plan how to clear a room, and then execute that plan.

“Each room poses a number of challenges,” DeepMind researchers wrote in their June 6 paper. “To escape the very first room, the agent must climb ladders, dodge a creature, pick up a key, then backtrack to open one of two doors.”

The researchers have also mentioned taking on more complex games like Starcraft in the future, and their AlphaGo win earlier this year indicates that the models they build can best even the most competent humans in specific, narrow tasks.

The challenge in the future will be taking these very specialized algorithms and making them applicable to something in the physical world, like teaching robots to walk or drive a car.