Demis Hassabis was a child prodigy in chess, reaching the Master standard at age 13, the second highest-rated player in the World Under-14 category, and also “cashed at the World Series of Poker six times including in the Main Event.”* In 1994 at 18, he began his computer games career co-designing and programming the classic game Theme Park, which sold millions of copies.* He then became the head of AI development for an iconic game called Black & White at Lionhead Studios. Hassabis earned his PhD from the University College London in cognitive neuroscience in 2009.
Figure: Demis Hassabis, CEO of DeepMind.
In 2010, Hassabis co-founded DeepMind in London with the mission of “solving intelligence” and then using that intelligence to “solve everything else.” Early in its development, DeepMind focused on algorithms that mastered games, starting with games developed for Atari.* Google acquired DeepMind in 2014 for $525M.
DeepMind Plays Atari
Figure: Breakout game.
To help the program play the games, the team at DeepMind developed a new algorithm, Deep Q-Network (DQN), that learned from experience. It started playing games like the famous Breakout game, interpreting the video and producing a command on the joystick. If the command produced an action where the player scored, then the learning software reinforced that action. The next time it played the game, it would likely do the same action. It is reinforcement learning, but with a deep neural network to determine the quality of a state-action combination. The DNN helps determine which action to take given the state of the game, and the algorithm learns over time after playing a few games and determining the best actions to take at each point.
Figure: Games that DeepMind’s software played on Atari.* The AI performed better than human level at the ones above the line.
For example, in the case of Breakout,* after playing a hundred games, the software was still pretty bad and missed the ball often. But it kept playing, and after a few hours—300 games—the software improved and played at human ability. It could return the ball and keep it alive for a long time. After they let it play for a few more hours—500 games—it became better than the average human, learning to do a trick called tunneling, which involves systematically sending the ball to the side walls so that it bounces around on top, requiring less work and earning more reward. The same learning algorithm worked not only on Breakout but also for most of the 57 games that DeepMind tried the technique on, achieving superhuman level for most of them.
Figure: Montezuma’s Revenge.
The learning algorithm, however, did not perform well for all games. Looking at the bottom of the list, the software got a score of zero on Montezuma’s Revenge. DeepMind’s DQN software does not succeed in this game because the player needs to understand high-level concepts that people learn throughout their lifetime. For example, if you look at the game, you know that you are controlling the character and that ladders are for climbing, ropes are for swinging, keys are probably good, and the skull is probably bad.
Figure: Montezuma’s Revenge (left) and the teacher and student neural networks (right)
DeepMind improved the system by breaking the problem into simpler tasks. If the software could solve things like “jump across the gap,” “get to the ladder,” and “get past the skull and pick up the key,” then it could solve the game and perform well at the task. To attack this problem, DeepMind created two neural networks—the teacher and the student. The teacher is responsible for learning and producing these subproblems. The teacher sends these subproblems to another neural network called the student. The student takes actions in the game and tries to maximize the score, but it also tries to do what the teacher tells it. Even though they were trained with the same data as the old algorithm, plus some additional information, the communication between the teacher and the student allowed strategy and communication to emerge over time, helping the agent learn how to play the game.