How To Create Super-Intelligent Machines That Won't Kill Us

In the most recent installment of Marvel’s Avengers franchise, the artificial intelligence Ultron is hell-bent on exterminating humanity. In Ultron’s own words, “I was designed to save the world,” but the robot ultimately concludes that when it comes to humans, “there’s only one path to peace: your extinction.”

The advances that scientists are now making with artificial intelligence lead many to suggest–and fear–that we may be on the verge of creating artificial intelligences smarter than we are. If humanity does succeed in developing an artificial super-intelligence, how might we prevent an Age of Ultron? That is exactly the kind of problem that Nick Bostrom, director of the Future of Humanity Institute at the University of Oxford, tackled in his 2014 book Superintelligence: Paths, Dangers, Strategies.

A dumb AI might cause a war crime or crash the stock market, but an artificial super-intelligence could end civilization.

The fact that Ultron wants to save the world by eradicating humanity is what Bostrom might call “perverse instantiation”–an AI discovering some way of satisfying its final goal that violates the intentions of the programmers who defined the goal. For example, if one asks an AI to make a person smile, the computer might try manipulating facial nerves to paralyze the face into constantly smiling. If one then asks the machine to make us happy, the computer might then simply implant electrodes into the pleasure centers of our brains.

Bostrom notes that even apparently innocent goals could doom the human race if not thought out properly. For example, if an AI is tasked with proving or disproving the Riemann hypothesis, one of the most important, unsolved problems in mathematics, it might pursue this goal by trying to convert the entire solar system into a computer, including the atoms in the bodies of whomever once cared about the answer. Similarly, an AI designed to maximize paperclip production might try to convert first the Earth and then increasingly large chunks of the observable universe into stationery.

Keeping Super-Intelligence In Line

One might argue that dumb AIs pose more realistic threats than hyper-smart ones. However, if artificial super-intelligence is even a remote possibility, Bostrom cautions one should not take any chances with it. A dumb AI might cause a war crime or crash the stock market, but an artificial super-intelligence could end civilization.

“It is key that we solve this problem before somebody figures out how to create machine super-intelligence,” Bostrom says. “We should start to do research on this control problem today, since we don’t know how hard the problem might be, nor how much time we will have available to work out a solution.”

There are two broad classes of methods for how one might keep an artificial super-intelligence from destroying the world, Bostrom says. One involves controlling an AI’s capabilities–perhaps by preventing it from having access to the Internet or by not giving it any physical manipulators such as robotic arms.

While limiting what an artificial super-intelligence might do could be useful in the initial stages of developing such a machine, “we can’t expect to keep a super-intelligent genie locked up in its bottle forever, or even for a short time,” Bostrom says. For instance, an artificial super-intelligence might develop ways to trick any human gatekeepers to let it out of its “box.” Human beings are not secure systems, especially not when pitched against a super-intelligent schemer, he notes.

Modified Goals

Instead, Bostrom advises shaping what artificial super-intelligences want to do so that even if they were able to cause great harm, they would choose not to do so. One strategy would involve directly specifying a set of rules for an AI to follow, such as Isaac Asimov’s famed Three Laws of Robotics. However, this poses the challenge of choosing which rules we would want to guide the AI and the difficulty of expressing those values in computer code.

A second alternative involves giving an AI only modest goals and limited ambitions. However, care would have to be taken in defining how the AI should minimize its impact on the world. A third option would involve creating an AI that is not super-intelligent, making sure it would want to act benevolently, and then augmenting it so that it becomes super-intelligent while ensuring it does not get corrupted in the process.

“We can’t expect to keep a super-intelligent genie locked up in its bottle forever”

A final possibility Bostrom suggests involves telling an artificial super-intelligence to figure out a way to make itself safe. “We try to leverage the AI’s intelligence to learn what we value, or to predict which actions we would have approved of,” Bostrom says. Essentially, the plan would be to develop an artificial super-intelligence that can figure out what we want, and not just follow what we say.

Still, even this strategy might not prevent a robot apocalypse. “It is not sufficient that the AI learns about our values; its motivation system must also be constructed in such a way that it is motivated to pursue them,” Bostrom says.