Gird yourself for another ‘the dress’ debate. You remember it even if you wish you didn’t: an image of a dress went viral on Twitter because it appeared, to some, to definitely be white and gold, while others were positive it was black and blue. Yes, the audio version of the viral visual atrocity is here, and it’s just as frustrating as that darn blue dress (yeah, I said it).
First, watch this video:
Now show it to all your friends and watch your sanity disappear as they all vehemently disagree. In a reply, another Twitter user claimed you can change what you hear by modulating the bass levels. We test that out here:
For the record, I can still only hear “laurel.” Some of our staff members were able to change which word they heard over time, especially with bass modulation. Our Science Editor can now suddenly sort of hear the whisper of a “yanny” in there, but it freaks her out and she does not like it. Your mileage may vary. But according to at least one expert, my fellow “laurel” compatriots and I are correct. Yes, friends: there may actually be a right answer here. Brad Story is a professor of speech, language, and hearing sciences at the University of Arizona, and he did a quick analysis of the waveform for us.
That first waveform is of the actual recording, which features the primary acoustic features of the “l” and “r” sounds. That leads Story to believe that the voice is really saying “laurel.” The fuzzier image below shows that the recording is of the third resonance of the vocal tract. As your vocal tract changes shape to form different sounds, it produces specific resonances, or natural vibrational frequencies. It’s these resonances that encode language within a soundwave (and thus how you can analyze a waveform and determine speech sounds).
He also recorded himself saying both words to demonstrate how the waveforms vary. You can see (though maybe only with the added arrows and highlighting) that the acoustic features match up between the actual video recording and the recording of Story saying “laurel.” It starts relatively high for the “l” sound, then drops for the “r” and goes back up high for the second “l.” Story explains that the “yanny” sound follows a similar path, just not with quite the same acoustic features. That wave also goes high-low-high, but the whole thing is shifted into the second resonance—not the third.
Britt Yazel, a researcher at the UC Davis Center for Mind and Brain, agrees. “I honestly think after looking at the spectrograms and playing with some filters that this is just the word “Laurel” with some high frequency artifacts overlaying it,” he says. At first he thought it was two overlaid voices, but then he started cleaning up the audio a bit. Now he thinks that the overlaid frequencies above 4.5 kHz are what sound like “yanny” to some people.
So why can’t we all hear “laurel”? “The low quality recording creates enough ambiguity in the acoustic feature that some listeners may be led toward the ‘yanny’ perception,” Story explains.
That lines up with what Nina Kraus, a researcher at Northwestern University who studies auditory biology, told PopSci. “The way you hear sound is influenced by your life in sound,” she explains. What you expect to hear is, to a large extent, what your brain will hear—and what your brain hears is all that matters.
Everything that you perceive, audio included, gets filtered through your brain before you’re consciously aware of it. For example, you can choose which sounds to pay attention to. This is how you’re able to hear someone talking to you at a loud party without noticing any other conversations, but can also switch over to listening in to the woman standing behind you. You’re choosing which sounds to pay attention to. Similarly, your brain is unconsciously choosing which frequencies in the recording to pay attention to and therefore to amplify. When your brain is primed to expect one of two sounds, you might just convince yourself you’ve heard the wrong one. A classic example of this is a MIDI file of “All I Want For Christmas”—sans vocals—that many listeners swear still features Mariah Carey belting out her famous version of the tune.
Kraus also has her own, slightly more scientific example. Listen to the first audio sample labeled “noisy version” below (courtesy of Kraus), which should sound pretty scratchy. Then listen to the second, labeled “clean version,” which is a clear voice. Now go back and listen to the first.
When you first hear the scratchy recording, most people hear a lot of static and no true voice. But when you go back and listen to it again after hearing the clear sentence, suddenly you can hear the voice hidden within the static. The difference is that the second time you’re expecting the voice.
Of course, there is also a chance that the recording contains both noises at once.
“When I first listened to the sounds I could hear both words strangely simultaneously,” says Heather Read, a sound perception and sensory neuroscience researcher at the University of Connecticut. She thinks the overlaid sounds are happening at once, with “yanny” occurring at higher frequencies, and what you hear depends on which frequencies your ear amplifies. She also thinks that if you play it repeatedly, you should hear “laurel” more and more. “Hopefully my ability to hear both words simultaneously reflects my musical ear or my acoustical scientific ear and not some other odd property of my brain,” she says. “But either way it’s fun that we all hear it differently.”
Story’s solution might be the best: somehow find the source of this strange noise and play it back for everyone on the same equipment. “With a high-quality recording, and if all listeners were listening with the same device, there may not be any confusion,” he says.
But for the record, it’s totally saying laurel.
Update: This post has been updated to provide even more evidence that it’s totally just saying laurel.