Have you heard? Siri, the virtual persona that speaks from your iPhone, sounds different now. The new voice officially rolls out today as a part of Apple’s latest mobile operating system, iOS 11. Her new pipes make her sound higher in pitch and younger. She’s perkier and more personable. Most important, she sounds more human.
Here’s how her newest voice came into the world: Someone read aloud from a book, and Apple recorded it. Her American accent in English was voiced by a specific individual, as was, for example, her British one. Apple’s goal with those recordings was to gather natural-sounding words and phonemes, or the sounds that comprise our words. From there, Apple uses machine learning to weave those phonemes together to make her speech sound as natural as possible. By focusing on assembling those word-sounds in the right way, Apple hopes to create a voice that emphasizes syllables correctly, so she doesn’t sound too artificial.
To better understand the qualities of Siri’s new voice, I sent clips of her speaking American English to Molly Babel, an assistant professor in the department of linguistics at the University of British Columbia. (Yes, she’s a language expert with the last name Babel, spelled just like “Tower of.”) Babel asked me to record Siri saying specific words—among them pasta, pool, and boot—and a passage, well known in linguistics, that contains a plethora of word-sounds. It begins, “Please call Stella.”
Her reaction? “I laughed a little bit when I heard some of her vowels,” Babel says. “She is textbook Californian.” Babel could tell by the way Siri said the “oo” sound in the words pool and boot, as well as the way she voiced other vowels. In the linguistic equivalent to a back-of-the-envelope calculation, Babel compared Siri’s voice to similar speakers in an accent archive, and confirmed Siri sounded most similar to Californians.
What’s more, her voice comes across as high-pitched and breathy, two factors that together give a youthful vibe, Babel points out. “She sounds really young,” she says, adding that the sound of her voice matches best with an American woman who is in her late teens or twenties.
That breathiness, which is a term that refers to the quantity of air moving over the vocal chords, makes her sound healthy, Babel says.
I asked her what kind of voices people want to hear from the virtual personas that emanate from their devices—the voices that give us information about the weather and our appointments.
“I do think she’s designed to sound willing to please,” Babel says. “And maybe that’s part of an unfair stereotype that we have toward young women, to be honest.” You can easily switch the voice to a male one, which has been available since 2013, in the iPhone’s settings.
“There is an appeal for the accent that we hear in our devices, that it be familiar, that it sound kind of like us,” she adds. “That helps fight against the feeling like we’re being talked down to.”
But does it work well?
Ultimately, Babel reflects, a virtual assistant shouldn’t just have an enjoyable voice that is easy to understand; it needs to also clearly understand the user. With iOS 11, Siri can verbally translate spoken English phrases into five different languages.
Translation is a useful feature, but a virtual assistant’s prime objective is to know what you asked for and respond the right way. If you ask Siri to call you an Uber, but she doesn’t understand the word “Uber” and prompts you to choose between the Uber and Lyft apps on your phone, you’ll be frustrated regardless of what her voice actually sounds like. (Siri’s error rate with word recognition has dropped greatly since Apple first launched the voice assistant in 2011.)
It’s a point that Timo Baumann, a systems scientist at Carnegie Mellon University who studies spoken computer systems like Siri, also makes. He listened to her speak as well. (Both Babel and Baumann heard her voice when iOS 11 was still in beta, before its official release today.)
“It seems to me that the new voice actually has much more personality in there than the old voice,” Baumann says. “The old voice was more distant.” For example, earlier this year, when Apple first revealed the digital assistant’s new vocal range, they demonstrated her saying the word “sunny,” with three different intonations—another example of her attempting to sound more human and natural.
Confidence and personality in a voice telegraph competence, meaning that when the digital assistant inevitably drops the ball, as they all do, the incongruence between tone and performance might be even more noticeable.
“This voice seems to really stand behind what it’s saying,” Baumann says. “This means that Apple has to take care that it actually can deliver. If you say something stupid with this voice, it’s going to sound doubly stupid.”
And, as to Babel’s observation about Siri sounding Californian, you can always just ask her where she’s from. And if you do, she might let you know: “Like it says on the box … I was designed by Apple in California.”