Rethinking The Turing Test

Oversimplified competitions encourage computer programs that are snarky rather than intelligent--but it doesn't have to be that way.
Character in how can i help you position
In the 2013 film, Elysium, critical jobs have been outsourced to intractible, inadequate machines, like this automated customer service rep. Image Courtesy of Elysium: The Art of the Film © 2013 Tristar Pictures

At a competition in June, a chatbot named Eugene duped a group of human judges into believing it was a Ukrainian teenager. The judges hailed it as the first time a machine passed the Turing Test—that hallowed measure of artificial intelligence (AI) proposed by computer scientist Alan Turing in 1950.

Eugene’s victory was short-lived. Within days, AI researchers had dismissed the chatbot’s achievement as a collection of canned responses. Then they took the Turing Test itself to task. Conceived of as a kind of existential parlor game, the test asks a human and a machine to respond to questions from remote interrogators. A computer mistaken for a person would prove that it had developed the capacity to mimic our own thought processes.

That all sounds good enough, but “people are easy to deceive,” says Ernie Davis, a computer scientist at New York University. “We’re used to the safe assumption that whoever is talking to us is actually an intelligent person.” So human officiants will likely give the computer the benefit of the doubt. Additionally, chatbots often mask their lack of reasoning by coming across as merely scatterbrained. For example, futurist Ray Kurzweil once asked Eugene, “If I have two marbles in a bowl and I add two more, how many marbles are in the bowl now?” “Not too many,” wrote Eugene. “I can’t tell you the exact number; I forgot it. If I’m not mistaken, you still didn’t tell me where you live.”

“We’re used to the safe assumption that whoever is talking to us is actually an intelligent person.”

In that way, the Turing Test doesn’t foster the development of machines with adaptive, human-level smarts. Instead, it exposes our own gullibility, and spawns programs whose greatest innovation is the tactical use of snarky non-sequiturs and manipulative charm.

The harsh criticism of AI’s most famous benchmark comes at a moment when interest and investment in the field are spiking. Google recently acquired AI firm DeepMind for $400 million, and IBM is investing $1 billion in its Watson system, the former Jeopardy! winner that’s now unraveling the genetics of brain cancer. Even the late Alan Turing is getting the Hollywood treatment this fall, as the subject of the biopic The Imitation Game. Some might say the field of AI doesn’t need the Turing Test anymore. We should just let machines grow smarter on their own inhuman terms.

That would be a mistake. The genius of the Turing Test is that it captured the public imagination and drove innovation. So why not build a new one better suited to the task of proving true artificial intelligence. “Maybe rather than looking at one big hurdle, we should try to understand how to make a bunch of small steps that lead us along the path to something useful,” says Noah Goodman, a cognitive scientist at Stanford University. Machines should have to tackle a range of tasks that emphasize nimble, on-the-spot thinking. Can it describe a video after seeing it for the first time, respond to direct questions with direct answers, and recognize nuances in language? Far more than a gimmick, such a system would finally demonstrate, in Turing’s words, “a machine that thinks.” Eugene was nowhere close.

This article originally appeared in the October 2014 issue of Popular Science.