Computers are getting pretty good at a growing roster of arcade and board games, including chess, Go, Pong, and Pac-Man. Machines might even change how video games get developed in the not-so-distant future. Now, after building an AI bot that outbluffs humans at poker, scientists at Meta AI have created a program capable of even more complex gameplay: one that can strategize, understand other players’ intentions, and communicate or negotiate plans with them through chat messages.
This bot is named CICERO, and it can play the game Diplomacy better than many human players. CICERO more than doubled the average score of its human opponents and placed in the top 10 percent of players across 40 games in an online league.
The program has been a work in progress for the past three years between engineers at Meta, and researchers from Columbia, MIT, Stanford, Carnegie Mellon University, UC Berkeley, and Harvard. A description of how the CICERO came together was published in a paper today in Science. The team is open sourcing the code and the model, and they will be making the data used in the project accessible to other researchers.
Diplomacy is originally a board game set in a stylized version of Europe. Players assume the role of different countries, and their objective is to gain control of territories by making strategic agreements and plans of action.
“What sets Diplomacy apart is that it involves cooperation, it involves trust, and most importantly, it involves natural language communication and negotiation with other players,” says Noam Brown, a research scientist at Meta AI and an author on the paper.
Although a special version of the game without the chat function has been used to test AI over the years, the progress with language models from 2019 onwards made the team realize that it might be possible to teach an AI how to play Diplomacy in full.
But because Diplomacy had this unique requirement for collaboration, “a lot of the techniques that have been used for prior games just don’t apply anymore,” Brown explains.
Previously, the team had run an experiment with the non-language version of the game, where players were specifically informed that in each game there would be one bot and six humans. “What we found is that the players would actively try to figure out who the bot was, and then eliminate that player,” says Brown. “Fortunately, our bot was able to pass as a human in that setting; they actually had a lot of trouble figuring out who the bot was, so the bot actually got first place in the league.”
But with the full game of Diplomacy, the team knew that the bot wasn’t ready to pass the Turing test if natural language interrogations were involved. So during the experiment, players were not told that they were playing with a bot—a detail that was only revealed after the game ended.
To construct the Diplomacy-playing AI, the team built two separate data processing engines that fed into one another: one engine for dialogue (inspired by models like GPT-3, BlenderBot 3, LaMDA, and OPT-175B), and another for strategic reasoning (inspired by previous work like AlphaGo and Pluribus). Combined together, the dialogue model, which was trained on a large corpus of text data from the internet and 50,000 human games from webDiplomacy.net, can communicate and convey intents that are in line with its planned course of action.
This works in the reverse direction as well. When other players communicate to the bot, the dialogue engine can translate that into plans and actions in the game, and use that to inform the strategy engine about next steps. CICERO’s grand plans are formulated by a strategic reasoning engine that estimates the best next move based on the state of the board, the content of the most recent conversations, moves that were made historically by players in a similar situation, and the bot’s goals.
“Language models are really good these days, but they definitely have their shortcomings. The more strategy that we can offload from the language model, the better we can do,” Brown says. “For that reason, we have this dialogue model that conditions on the plans, but the dialogue model is not responsible for the plans.” So, the part of the program that does the talking is not the same as the part that does the planning.
The planning algorithm the bot uses is called piKL. It will make an initial prediction of what everyone is likely to do and what everyone thinks the bot will do, and refine this prediction by weighing the values of different moves. “When doing this iterative process, it’s trying to weigh what people have done historically given the dataset that we have,” says Brown. “It’s also trying to balance that with the understanding that players have certain objectives in this game, they’re trying to maximize their score and they’re going to not do very serious mistakes as they would minor mistakes. We’ve actually observed that this models humans much better than just doing the initial prediction based on human data.”
“Deception exists on a spectrum”
Consider the concept of deception, which is an interesting aspect of Diplomacy. In the game, before each turn, players will spend 5 to 15 minutes talking to each other and negotiating plans. But since this is all all happening in private, people can double deal. They can make promises to one person, and tell another that they’ll do something else.
But just because people can be sneaky doesn’t mean that’s the best way to go about the contest. “A lot of people when they start playing the game of Diplomacy they view it as a game about deception. But actually if you talk to experienced Diplomacy players, they think with a very different approach to the game, and they say it’s a game about trust,” Brown says. “It’s being able to establish trust with other players in an environment that encourages you to not trust anybody. Diplomacy is not a game where you can be successful on your own. You really need to have allies.”
Early versions of the bot were more outright deceptive, but it actually ended up doing quite poorly. Researchers then went in to add filters to make it lie less, leading to to much better performances. But of course, CICERO is not always fully honest with all of its intentions. And importantly, it understands that other players may also be deceptive. “Deception exists on a spectrum, and we’re filtering out the most extreme forms of deception, because that’s not helpful,” Brown says. “But there are situations where the bot will strategically leave out information.”
For example, if it’s planning to attack somebody, it will omit the parts of its attack plan in its communications. If it’s working with an ally, it might only communicate the need-to-know details, because exposing too much of its goals might leave it open to being backstabbed.
“We’re accounting for the fact that players do not act like machines, they could behave irrationally, they could behave suboptimally. If you want to have AI acting in the real world, that’s necessary to have them understand that humans are going to behave in a human-like way, not in a robot-like way,” Brown says. “Having an agent that is able to see things from other perspectives and understand their point of view is a pretty important skillset going forward in human-AI interactions.”
Brown notes that the techniques that underlie the bot are “quite general,” and he can imagine other engineers building on this research in a way that leads to more useful personal assistants and chatbots.