Computer scientist Alan Turing's infamous Turing test--possibly the thing he's known best for out of a long resume--is a simple, solid bar for artificial intelligence that's held up since the 1950s. But this weekend that bar was nearly reached. Judges surveyed in the largest-ever Turing competition agreed 29 percent of the time that Eugene Goostman was a 13-year-old boy, and that was good enough for the chatbot to win.
Thirty judges, 25 humans, and five software programs competed in the event, held Saturday for the 100th anniversary of its inventor, where human judges attempt to distinguish whether they're having a conversation with a human being or a robot. Computer scientist Alan Turing said a machine could be considered intelligent if it was able to fool humans 30 percent of the time.
Even though it wasn't enough to break that barrier, the 29 percent was a good showing for Goostman. A regular at the more famous Turing test event, the Loebner prize, Goostman has never taken home first place. But he does have a lot of unique attributes; unlike his colleagues, Goostman has a fixed personality. He is always a 13-year-old Ukrainian boy with a guinea pig. His father is always a gynecologist. It's a different approach than the one most chatbots take, where social media or past conversations are mined to act human, and even if it didn't definitively pass as intelligent, it looks like it paid off.
That's kind of cheating though, isn't it? I mean, if you're simulating a child, then the artificial intelligence can be far less sophisticated. I'm not trying to belittle what was achieved but I think there should be some distinction between artificial intelligence imitating a human child and imitating a human adult.
Humans set the bar for "intelligence". Whether it's a 2 year old, a 12 year old, or a 100 year old, there are characteristics associated with our definition of intelligence that all these humans will display which have not ever been seen outside our species. For a computer program to have the ability to convince almost 1/3 of the judges that it is, in fact, a real human being, is quite extraordinary (even with the "personality" being that of a 13 year old). I still wonder how the age gap between the judges and the "boy" affected what questions were asked...However, if you look at the way kids that age normally "talk" via text-based messaging and consider the fact that this program must have determined not only whether the question was something a 13 year old could comprehend, but also for it to respond to it in a realistic and believable manner, then it's pretty damn incredible.
It would be an interesting test to see if 30% of 13 year olds thought it was a real 13 year old.
Turing tests aren't extremely complicated...the judging isn't influenced by whether or not questions are answered correctly, it's a comparison of the response to "typical human responses". Based on the Turing tests' judging criteria along with the fact that this program convinced so many people that it was a 13 year old boy, I'd say it's pretty likely that it could fool a 13 year old too.
If you look into the types of questions that typically "fool" AI programs then I think you would agree. Semantics plays a large role.
I'm with Flingbot. The test must be as simple as possible so if you are trying to simulate a 13-year old, all the 'potentials' should be doing (or be) the same.
As for the answers, it may not be about whether they are correct but certainly an answer that is improbable should be a red flag.
I visited the Princeton site and asking only a few questions, it was very clear that the responder was not human. Of course that is not the Turing test ... I know "who" I'm talking to. I asked where "he" was from and he responded "Odessa". Later he asked me where I was from and I also answered Odessa prompting him to ask if it was a nice place. I suppose it could be a 13-year old playing Xbox!
Perhaps they were using boxing judges for this test?
Obviously some responses from this program are "fixed variables", such as "Eugene Goostman's" background and basic information (like where he's from). And like I've said multiple times before, it's not hard at all to trick these programs with certain questions or responses.
"As for the answers, it may not be about whether they are correct but certainly an answer that is improbable should be a red flag"
Yes, improbable answers are a red flag. That's why the Turing test requires that the given answers are compared to a "typical human response" for judging purposes. And just as you consider improbable answers to be a red flag, the programmers undoubtedly used probability to determine what the program would most likely be asked and I highly doubt that bumping into a fellow Odessian was something they expected to happen to Eugene.
These programs aren't all knowing, and I wouldn't call them "intelligent" by any means because they don't have the ability to learn. You can tell Eugene that your favorite color is blue 100 times, but he won't be able to answer "What is my favorite color?" correctly after that without guessing.
Personally, I don't think this is the avenue that will lead to the emergence of a true artificially intelligent machine. In my opinion, the way to "crack the code" of artificial intelligence is to approach the coding from a much more fundamental perspective. The program should simulate neurological activity at the cellular level and essentially provide the computer with virtual neurons/synapses/neurotransmitters with which it can learn and expand in a way that we can understand. I don't think we currently have a thorough enough understanding of the fundamentals of how our brains work to accomplish this, but I don't think we're terribly far off.
I agree with almost everything you've said. It still makes me wonder what kinds of questions the judges were constrained to ask ... resulting in nearly 30% thinking Goostman was a human. I will endeavor to find out!
I can't help thinking about the movie Bladerunner even though the details are quite different. There would be a series of questions that should always break a weak A.I. 'entity' because the answers can't be hard coded properly. Like in chess, when the questions dig down to a certain depth, it should become clear that the A.I. is using memory rather than intelligence.
I somewhat agree about modeling the first strong A.I. on the human brain. The problem is that we are far from understanding the human brain. We may even be several levels of complexity behind ... just a guess on my part as we barely understand how a single cell processes information. However, I think once there is a functioning A.I. that is well smarter than any human, it will re-design itself using a more practical system. Biology is complex and fantastic but still limited in many ways.
I agree that ultimately any "artificial intelligence" would endeavor to increase it's own efficiency and maximize processing power by redesigning itself in such a way that its thought process is more computer-friendly. . .
Obviously there are too many neurotransmitters and receptors with too many functions (not to mention all the unknowns associated with the human brain) for them to be efficiently modeled using a computer. . .But say you were limited to a single neurotransmitter, which would essentially indicate "on" or "off", 1 or 0. My thought is that each transistor should function like a very simplified neuron, capable of communicating with the other "neurons" only using their current state (1 or 0). However there are well over 40 unique neurotransmitters that are considered somewhat common, and that's ignoring the fact that different combinations/ratios of any of those could result in a unique response. In other words, we would need a LOT of transistors. . .