Why scientists made robots that scheme and sabotage

Can a robot tell if it’s being sabotaged? That might not be the kind of question you consider, but it’s inquiries like this one that motivated some unusual studies out of MIT.

To that end, researchers at Massachusetts Institute of Technology created a simulation of two socially aware robots that can now tell if they’re being sabotaged or helped. In a new paper presented at the 2021 Conference on Robot Learning in London this week, a team from MIT demonstrated how they used a mathematical framework to imbue a set of robotic agents with social skills so that they could interact with one another in a human-like way. Then, in a simulated environment, the robots could observe one another, guess what task the other wants to accomplish, then choose to either help or hinder them. In effect, the bots thought like humans.

Research like this might sound a little strange, but studying how different kinds of social situations play out among robots could help scientists improve future human-robot interactions. Additionally, this new model for artificial social skills could also potentially serve as a measurement system for human socialization, which the team at MIT says could help psychologists study autism or analyze the effects of antidepressants.

Socializing robots

Many computer scientists believe that giving artificial intelligence systems a sense of social skills will be the final barrier to crack in order to make robots actually useful in our homes, in settings like hospitals or care facilities, and be friendly to us, says Andrei Barbu, a research scientist at MIT and an author on this recent paper. After retooling the AI, they can then bring these tools into the field of cognitive science to “really understand something quantitatively that’s been very elusive,” he says.

“Social interactions are not particularly well-studied within computer science or robotics for a few reasons. It’s hard to study social interactions. It’s not something that we assign a clear number,” says Barbu. “You don’t say ‘this is help number 7’ when you’re interacting with someone.”

This is unlike the usual problems that arise in AI, such as object recognition in images, which are fairly well-defined, he says. Even deciding what kind of interactions two people are having—the easiest level of the problem—can be extremely difficult for a machine.

So, how can scientists build robots that not only do a task, but also understand what it means to do the task? Could you ask a robot to understand the game you’re playing, figure out the rules just by watching, and play the game with you?

To test out what was possible, Barbu and colleagues set up a simple two-dimensional grid that virtual robotic agents could move around in to complete different tasks. The agents on the screen looked like cartoon robot arms, and they were instructed to either move a water bucket to a tree or to a flower.

To socialize the agents, the researchers borrowed a few tips from psychology to come up with some basic, but different, categories of social interactions that they then coded into a series of actions and reactions. They adapted an off-the-shelf model from robotics called the Markov decision process (MDP), which is a network of actions and rewards that can help a robotic system make decisions towards accomplishing a goal based on the current state of the world. To insert the social element, the researchers tweaked the reward feedback for the robots so that it would modify what it wanted based on another robot’s needs.

“What we have is this mathematical theory that says if you want to perform a social interaction, you should estimate the other agent’s goal, then apply some mathematical function to those goals,” says Barbu. “We treat social interaction as functions of one another’s reward.” This means that Robot A has to factor in what Robot B is going to do before it takes an action. And the hypothesis is that this is the basic mechanism that underlies social interactions in humans.

But these types of robots face limitations. For instance, they cannot recognize customary social interactions, like cultural traditions around politeness that vary from country to country. The basic framework in the research is that robo-agent one looks at what agent two is doing, then attempts to predict agent two’s goals based on its own possible goals and its surroundings. Then, if agent A’s reward is set to agent B’s goals, it would help agent B. However, if the reward is set to the opposite of the other agent’s goals, then it would block the other agent from accomplishing its goal.

Researchers can add layers to make the social interaction more complex. “There are some attributes of the action, the rewards, the goals of the other agents that you want to estimate,” says Barbu. “We have more complicated social interactions like exchanging something with another agent where you have to figure out how much is this action worth it to them, how much is it worth it to me.”

For the next steps, the team is actively working on replicating these models with robots in the real world, with added interactions like exchange, and coercion.

Can humans tell if robots are socially interacting?

To get some human perspectives about how well they coded social interactions, the researchers generated 98 different scenarios with the robots having varying levels of social reasoning. A level 0 agent can only take physical actions. It’s not social itself, and doesn’t recognize others as social. A level 1 agent has a physical goal and is social, but doesn’t realize other agents are social. It can help, hinder, or steal from other agents, but doesn’t pick up on if the other agent tries to get in its way. Level 2 agents have physical and social goals, and it thinks other agents are level one. Therefore, it can avoid sabotage, recognize that help is needed, and collaborate.

Then, twelve human subjects watched 196 video clips of these robots interacting, which were in essence a series of computer animations. Following the viewing, they were asked to predict how social the robots were and whether their interaction was negative or positive. In most cases, the humans accurately identified the social interactions that were occurring.

“Something we’re also very interested in on the cognitive science side is how do people understand these kinds of social interactions,” says Barbu. “What is it that the humans are picking up on? And what happens when humans don’t agree with our models?”

[Related: Do we trust robots enough to put them in charge?]

Another question Barbu has pondered is whether this model could be used to analyze how different types of diseases and disorders, like depression, PTSD, or autism, may affect people’s social interactions or their perception of social interactions.

The study, which was funded in part by DARPA and the US Air Force, could one day inform research about language acquisition, and the importance of context in voice requests.

“The vast majority of the language that we use with one another has to do with interactions with other people. So for many years, we studied this problem of grounding,” says Barbu. Grounding is taking something as abstract as language and connecting it to something practical that you see in the world.

“If you look at the vast majority of what someone says during their day, it has to do with what other [people] want, what they think, getting what that person wants out of another [person],” he says. “And if you want to get to the point where you have a robot inside someone’s home, understanding social interactions is incredibly important.”

That’s because words we normally treat as very concrete in most action recognition can have different social connotations. For example, ‘get’ the water bottle is a completely different interaction than ‘get’ the child. Although the command is technically the same, social awareness would alert the robot to be gentler with the child than with the water bottle.

That’s something the team has been working on for DARPA, which was interested mostly in models for child language acquisition to help US soldiers interact with people all over the world who don’t speak English.

“It’s hard to train language translation models. It’s hard to produce resources to train soldiers. DARPA is interested in robots that can acquire language the same way that kids do, because kids acquire language not by having a large data corpus or anything like that,” Barbu says. “They acquire language by seeing [people] interact with one another, seeing those interactions in a physical context.”

They reached a point in the research where they wanted to learn about language that is social, and pivoted to making social models that will later be integrated into their language learners.

The work that they’re doing for the MIT Air Force AI accelerator is of a similar caliber, where they’re laying the groundwork for the Air Force to build a voice interface AI assistant, like Amazon’s Alexa, that can answer questions about the billions of documents on weapons systems, aircrafts, and more.

“Figuring out where the answer to your specific question about this aircraft, this weapons system, under this condition, is difficult,” says Barbu. Additionally, the Air Force hopes that a social AI assistant can hold a bi-directional dialogue, and ask rational questions back.