Voice command has made huge strides in recent years, especially in the mobile space–Google has implemented voice search and some basic commands into Android, and now Apple has integrated Siri, a voice-command app, deeply into the guts of the iPhone.
Talking not just to, but with our computers has always been a tantalizing, futuristic idea–not least for its radical potential to make applications more fully accessible to the disabled.But as we were discussing voice control at PopSci HQ yesterday in light of Apple’s news, we realized something interesting: while none of us younger tech types use today’s voice control tech regularly, it seems as if our parents actually do use it, and often. Why is that?
For us, evolving the way we interact with our gadgets is easy. Going from a mouse to a multitouch trackpad? No problem. Moving from a hardware keyboard to a software keyboard, and then to a new system like, say, Swype? As natural as buying new clothes. We’re hardwired to adapt to electronics, to see what’s familiar and explore what’s new, and to learn how to do the same things we could do before while integrating the new things that are now possible.
So, voice command? That’s not something new. It’s something old. Talking to a gadget is something people did with landlines, and for us, it’s kind of a step backwards. Why unlearn our lightning-fast smartphone typing skills, just so we can dictate into a phone? We’ve moved past that. Dr. Zahorian, a professor at Binghamton University who’s spent much of his career working with speech recognition, told me about a colleague, another expert on speech recognition, who demonstrated the latest and greatest system from MIT to a high school class. “Why would I do that when I can type so much faster?” was the response from the high schoolers.
But without the benefit of our gadget-wired brains, our parents have to adapt to each new input method from scratch, and that can be tough. They tend to make things more complicated than they actually are, even as gadgets have gotten simpler. Voice command is comforting; it’s been around awhile, unlike, say, a multitouch keyboard that operates with long, complicated swipes. It feels simple: just tell your phone what to do, dammit!
A major part of the youthful avoidance of voice command is its finickiness. Voice command is getting better, certainly, but even the most modern, advanced systems can still get things laughably wrong–or worse, do them slowly, and since we’re well-versed in several forms of standard text-entry, we tend to choose that just out of a desire for speed. Key example: Microsoft’s Kinect is one of the most incredible gadgets we’ve ever seen, and its voice command is so advanced that it can pick out separate voices, or even the direction the voice is coming from. But performing simple tasks is still a bit slow. Say you’re watching Hulu on your Xbox 360. You want to pause it. You have two options: pick up a remote and hit the pause button, or say “Xbox,” then perhaps “XBOX” a bit louder and more forcefully if it didn’t hear you the first time, then wait for the playback menu to pop up, and then say “Pause. PAUSE.” It’s cool, in theory, and fairly fast when it works smoothly (which is quite often), but I certainly find myself reaching for the remote instead more often than not.
Google Voice, too, uses voice recognition, in this case to transcribe voicemail messages so you can simply read them rather than listen to them. That’s a great idea! Except it works so poorly that comedian Paul F. Tompkins has an entire, well-loved bit about the “wildly inaccurate” transcriptions. Even Dr. Zahorian noted that “standard systems tend to not work very well,” and that he only uses them “in situations when it’s hard to type,” like in the car.
(Another factor, and not a small one, is that it looks sort of dorky and awkward to talk at your phone. We’re not on Star Trek, after all. Not that that alleviates dorkiness.)
So we ignore voice command. It’s not that it’s bad, it’s that when we’re equally familiar with all different kinds of input, we are naturally inclined to go with the most efficient. But that’s emblematic of a weakness as well, a tendency to go with whatever’s fastest, no matter what. And our parents don’t suffer from that weakness, so they’ll go with what seems the most comfortable–which means talking to your gadget.
But that could all be changing, thanks to Apple’s emphasis on Siri. The use of natural language is one that we shouldn’t ignore: being able to actually tell your phone what you want it to do, without having to learn a new language the phone can understand, is a big step. And there actually are instances Apple demonstrated that it could conceivably be faster than other methods. Finding, say, a Cuban restaurant in Park Slope, Brooklyn would require a bunch of taps–you’d have to swipe through your apps to find the Yelp app, select the cuisine you want, the location you want, and then order by rating. Siri makes that easy; just say “find me a Cuban restaurant in Park Slope, Brooklyn” and it’ll take you right to that last step. That’s great, and it definitely has the potential to finally make touch command enticing to those more comfortable with typing.
What we’re really excited about with Siri is simplicity. Simplicity, or, more correctly, the appearance of simplicity, is the reason why Siri will work for Apple. Siri doesn’t transcribe your voice to text, like Google’s or Microsoft’s voice command options. It understands both what you’re saying and what you’re trying to do. For someone for whom messing around with settings and apps and software keyboards is a massive headache, Siri and future voice command options like it are a godsend. Finally: a way to tell your phone what you want to do.
That’s of benefit not just to those who would rather not mess around with the more technical elements of a phone, but also those who can’t, due to circumstance or disability. The car is the most obvious use of this kind of input tool, since you legally can’t (and practically shouldn’t) be staring at your phone’s screen while driving. For those who are unable to manipulate a touchscreen, or who need Siri’s ability to read back commands due to vision impairment, this kind of system is also invaluable–there have been accessibility solutions like this for years, but none that could delve so deep into a phone’s OS.
So even for my generation, it just might be more efficient, in certain situations, to ask my phone a question rather than open an app and type it out. Replacing typing with speech is, with some exceptions (like in the car), not a viable jump for the younger generation. But if it legitimately is easier, or faster, we’ll embrace it–along with our parents. Whether we embrace our parents is unrelated.