How our brains differentiate music and speech

Scientists are beginning to understand the auditory version of 'seeing faces in the cloud.'
Laura Baisas Avatar
a woman listens to music with headphones on over hear ear
A better understanding of how our brains tell music and speech apart could lead to better therapeutics for people with speech disorders like aphasia.

Deposit Photos

Countless times per day and without even realizing it, our ears pick up both music and speech. In turn, our brains help us tell the difference between the song of the summer or a friend telling us a story. Now, a team of scientists have mapped out how this process works, which could lead to new treatment options to help patients with aphasia regain the ability to speak. The findings are detailed in a study published May 28 in the journal PLOS Biology.

“Although music and speech are different in many ways, ranging from pitch to timbre to sound texture, our results show that the auditory system uses strikingly simple acoustic parameters to distinguish music and speech,” study co-author and New York University cognitive psychologist Andrew Chang said in a statement

Measuring noise 

What scientists do have a good handle on is how to gauge the rate of audio signals using a unit of measurement called Hertz (Hz). The larger the number of Hz, the greater number of occurrences–or cycles–per second. A person typically walks at a pace of 1.5 to 2 steps per second, or 1.5 to 2 Hz. Stevie Wonder’s song Superstition is approximately 1.6 Hz. By comparison, speech is roughly two to three times faster at 4 to 5 Hz.

[Related: How do sound waves work?]

A song’s volume over time–or amplitude modulation–is pretty steady at 1 to 2 Hz. Human speech has an amplitude modulation of 4 to 5 Hz, which means its volume changes frequently. 

Despite music and speech being pretty much everywhere all the time, scientists still do not have a clear understanding of how our auditory system can so effortlessly and automatically pinpoint sound as speech or music.  

Hearing voices in a cacophony

In this study, the team conducted a series of four experiments. Over 300 participants listened to a series of audio segments of synthesized music and speech-like noise. The clips had various amplitudes, modulation speeds, and regularity. However, the audio clips allowed the ear and brain to only detect volume and speed and the participants were told that the sounds were  noise-masked music or speech.

The team asked the participants to judge whether these ambiguous noise clips sounded more like music or speech. They then observed and analyzed the pattern of participants determining whether through hundreds of noise clips were speech or music. According to the team, it is like an auditory version of “seeing faces in the cloud.” If there is a certain feature in the soundwave that matches what a listener perceives music or speech, even a clip of white noise can sound like music or speech.

The team found that our auditory system uses simple and basic acoustic parameters to tell the differences between speech and music. To the participants, clips with a slower rate of less than 2 Hz and a more regular amplitude modulation sounded like music. Clips that had a higher rate of roughly 4 Hz and a more irregular amplitude modulation was more like someone talking. 

“Overall, slower and steady sound clips of mere noise sound more like music while the faster and irregular clips sound more like speech,” said Chang. 

Future applications

Understanding how the human brain can tell the difference between talking and music may benefit those with aphasia, a language disorder that affects more than one million Americans. typically follows a stroke or traumatic brain injury, but can also happen during more temporary issues such as migraine. 

[Related: Why do humans talk? Tree-dwelling orangutans might hold the answer.]

According to the team, one promising approach is melodic intonation therapy. This method trains people with aphasia to sing what they want to say. It does this by using their intact “musical mechanisms” to go around the damaged speech mechanisms. Understanding what makes speech and music similar and distinct could help design more effective rehabilitation programs.