Voice command has made huge strides in recent years, especially in the mobile space--Google has implemented voice search and some basic commands into Android, and now Apple has integrated Siri, a voice-command app, deeply into the guts of the iPhone.
Talking not just to, but with our computers has always been a tantalizing, futuristic idea--not least for its radical potential to make applications more fully accessible to the disabled.But as we were discussing voice control at PopSci HQ yesterday in light of Apple's news, we realized something interesting: while none of us younger tech types use today's voice control tech regularly, it seems as if our parents actually do use it, and often. Why is that?
For us, evolving the way we interact with our gadgets is easy. Going from a mouse to a multitouch trackpad? No problem. Moving from a hardware keyboard to a software keyboard, and then to a new system like, say, Swype? As natural as buying new clothes. We're hardwired to adapt to electronics, to see what's familiar and explore what's new, and to learn how to do the same things we could do before while integrating the new things that are now possible.
So, voice command? That's not something new. It's something old. Talking to a gadget is something people did with landlines, and for us, it's kind of a step backwards. Why unlearn our lightning-fast smartphone typing skills, just so we can dictate into a phone? We've moved past that. Dr. Zahorian, a professor at Binghamton University who's spent much of his career working with speech recognition, told me about a colleague, another expert on speech recognition, who demonstrated the latest and greatest system from MIT to a high school class. "Why would I do that when I can type so much faster?" was the response from the high schoolers.
But without the benefit of our gadget-wired brains, our parents have to adapt to each new input method from scratch, and that can be tough. They tend to make things more complicated than they actually are, even as gadgets have gotten simpler. Voice command is comforting; it's been around awhile, unlike, say, a multitouch keyboard that operates with long, complicated swipes. It feels simple: just tell your phone what to do, dammit!
A major part of the youthful avoidance of voice command is its finickiness. Voice command is getting better, certainly, but even the most modern, advanced systems can still get things laughably wrong--or worse, do them slowly, and since we're well-versed in several forms of standard text-entry, we tend to choose that just out of a desire for speed. Key example: Microsoft's Kinect is one of the most incredible gadgets we've ever seen, and its voice command is so advanced that it can pick out separate voices, or even the direction the voice is coming from. But performing simple tasks is still a bit slow. Say you're watching Hulu on your Xbox 360. You want to pause it. You have two options: pick up a remote and hit the pause button, or say "Xbox," then perhaps "XBOX" a bit louder and more forcefully if it didn't hear you the first time, then wait for the playback menu to pop up, and then say "Pause. PAUSE." It's cool, in theory, and fairly fast when it works smoothly (which is quite often), but I certainly find myself reaching for the remote instead more often than not.
Google Voice, too, uses voice recognition, in this case to transcribe voicemail messages so you can simply read them rather than listen to them. That's a great idea! Except it works so poorly that comedian Paul F. Tompkins has an entire, well-loved bit about the "wildly inaccurate" transcriptions. Even Dr. Zahorian noted that "standard systems tend to not work very well," and that he only uses them "in situations when it's hard to type," like in the car.
(Another factor, and not a small one, is that it looks sort of dorky and awkward to talk at your phone. We're not on Star Trek, after all. Not that that alleviates dorkiness.)
So we ignore voice command. It's not that it's bad, it's that when we're equally familiar with all different kinds of input, we are naturally inclined to go with the most efficient. But that's emblematic of a weakness as well, a tendency to go with whatever's fastest, no matter what. And our parents don't suffer from that weakness, so they'll go with what seems the most comfortable--which means talking to your gadget.
But that could all be changing, thanks to Apple's emphasis on Siri. The use of natural language is one that we shouldn't ignore: being able to actually tell your phone what you want it to do, without having to learn a new language the phone can understand, is a big step. And there actually are instances Apple demonstrated that it could conceivably be faster than other methods. Finding, say, a Cuban restaurant in Park Slope, Brooklyn would require a bunch of taps--you'd have to swipe through your apps to find the Yelp app, select the cuisine you want, the location you want, and then order by rating. Siri makes that easy; just say "find me a Cuban restaurant in Park Slope, Brooklyn" and it'll take you right to that last step. That's great, and it definitely has the potential to finally make touch command enticing to those more comfortable with typing.
What we're really excited about with Siri is simplicity. Simplicity, or, more correctly, the appearance of simplicity, is the reason why Siri will work for Apple. Siri doesn't transcribe your voice to text, like Google's or Microsoft's voice command options. It understands both what you're saying and what you're trying to do. For someone for whom messing around with settings and apps and software keyboards is a massive headache, Siri and future voice command options like it are a godsend. Finally: a way to tell your phone what you want to do.
That's of benefit not just to those who would rather not mess around with the more technical elements of a phone, but also those who can't, due to circumstance or disability. The car is the most obvious use of this kind of input tool, since you legally can't (and practically shouldn't) be staring at your phone's screen while driving. For those who are unable to manipulate a touchscreen, or who need Siri's ability to read back commands due to vision impairment, this kind of system is also invaluable--there have been accessibility solutions like this for years, but none that could delve so deep into a phone's OS.
So even for my generation, it just might be more efficient, in certain situations, to ask my phone a question rather than open an app and type it out. Replacing typing with speech is, with some exceptions (like in the car), not a viable jump for the younger generation. But if it legitimately is easier, or faster, we'll embrace it--along with our parents. Whether we embrace our parents is unrelated.
In the general reading of this article is seems so negative and lengthy towards "voice command control". Just being so negative and lengthy, I now have a greater desire to try this phone out for myself. ;)
I have tried several VR setups in Blackberry and a couple of others. None work. According to the troubleshooters, my voice is too deep. If Siri can deal with a baritone (I ain't having the operation to make me a falsetto) then I'm interested. They'll have to convince me, tho. Draconian rules about cell phones while driving ($250 first offense) made me quit making calls. I can talk, I just can't dial. I have to use auto-answer, which is a PITA since it doesn't discriminate and it answers calls from my ex-wife. Curses! If this works I'll be back talking while I drive.
A man sitting on a park bench is a lonely kind of man. He looks over his new phone he just bought and thought he try out some of its functions. Putting the cell phone on speaker he decides to call himself. The phone being active is busy and so he is prompted to leave a voice mail message. Well, the lonely man, leaves the message 'CALL ME'.
The cell phone hears what message he says and tries to call him but the cell phone is busy and so leaves blank message.
The man closes the cell phone and sets it on the bench next to him. In a moment his cell phone beeps he has a message. The man puts the cell phone on speaker and begins to listen to the message. The man hears " CALL ME " and naturally the cell phone hears his phone and attempts to call him, but the line is busy and so leaves a blank message.
The man closes his cell phone and set its down beside him.
Beep goes his cell!
I think I just automated a cure for loneliness. ;)
What a strange article. First it rambles on about how cool young techies can't be bothered with VC, but the oldsters love it.
Really! -- and how would you know that ?
But, continuing on,and suddenly remembering that this is PopSci(Apple), out pops a little homage to Siri ( with a fringe on top?). Since Siri is an Apple product, it's obviously wonderful.
One correction, a "Your doing it wrong moment" when you said "say "Xbox," then perhaps "XBOX" a bit louder and more forcefully if it didn't hear you the first time, then wait for the playback menu to pop up, and then say "Pause. PAUSE." Kinect is actually really fast, its just waiting for you. You can say "Xbox pause" all in one sentence and Kinect will obey effortlessly even without the playback menu. For some reason people think they need to see the playback menu pop up to use voice control when this is not the case; The menu is just to remind you of commands you can say, not when you need to say them. -Kris Johnson
I want to have Siri on my iMac :)
Btw, all Macs have microphones, so it's easy to integrate just by update software.
This is your Mother and I'm am not amused.
I purchased the iPhone 4S on 14 Oct. only after my iPhone 2G version finally died the day before the 4S was available.
You are right I don't need to have every new gadget to make phone calls and yes I "learned' on a black desk dial phone. I chose the 4S over the 4 specifically since I tend to keep my phone until it no longer functions so I decided on the newest hardware/operating system option plus the Siri feature.
I'm glad that y'all have perfected your "swypes" and your touchpad typing so that you'll be that much better in not actually talking with people. You won't have to waste all of your time letting someone know they are important enough to have your full an undivided attention.
Using Siri in the car may seem like a "nice to have" feature but I don't need a law to tell me that you are not driving alert if you are also manipulating your features on your smart phone. However I have read that young people have problems with these judgment calls and are are apt to feel more capable behind the wheel than us old folks with 30 plus years driving experience.
I've used Siri in the car and it works great. Since you activate it with the home button you don't have to take your eyes off of the road, an important feature for us old folks who still drive defensively on the Interstate going 70 mph.
I just wonder what your younger generation (and what generation exactly is that? Gen Y, Gen X, Gen Millenium?) is doing with all of this extra time you are saving with your speed typing? Spending more quality time with people? Improving your mind? Helping others? Working to earn what you need/want?
I hadn't really thought of this Siri feature as being able to just "tell your phone what to do." It is a tool that technology makes available that is useful in certain circumstances that I can choose to use, just like any App I choose to use. Perhaps the tool's usefulness is a function of it's user? Michelangelo may have painted the Sistine Chapel faster with a can of spray paint but would it have been better?
I think I'll take my landline copper pair wired brain and go talk to a human being without the benefit of a touchpad.
For the record, the comment above was not written by the author's mom, since that is my proud title.
Y'all are missing the point.
Speech recongnition is by Nuance - a leader in the business.
Siri comes from the DARPA Cognitive Assistant that Learns and Organizes (CALO) project, a program that ran for 5 years and 300 researchers from 25 of the top universities and commercial research organizations. SRI International took the knowledge gained by CALO and formed Siri which Apple purchased.
Apple denatured Siri and plugged it into Yelp, Wolfson Alpha, search engines, the map app, and a few other places and apps. The potential of Siri is much greater than anything you've seen thus far, which is why Apple is calling Siri beta at the moment.
Siri represents one of the fruits of real, approachable AI, and has the potential of being the next big game-changer in terms of UI and knowledge navigation.