Monkey mouth sounds could push the evolution of speech back by 27 million years

Sound doesn’t fossilize. Language doesn’t either.

Even when writing systems have developed, they’ve represented full-fledged and functional languages. Rather than preserving the first baby steps toward language, they’re fully formed, made up of words, sentences, and grammar carried from one person to another by speech sounds, like any of the perhaps 6,000 languages spoken today.

So if you believe, as we linguists do, that language is the foundational distinction between humans and other intelligent animals, how can we study its emergence in our ancestors?

Happily, researchers do know a lot about language—words, sentences, and grammar—and speech—the vocal sounds that carry language to the next person’s ear—in living people. So we should be able to compare language with less complex animal communication.

And that’s what we and our colleagues have spent decades investigating: How do apes and monkeys use their mouth and throat to produce the vowel sounds in speech? Spoken language in humans is an intricately woven string of syllables with consonants appended to the syllables’ core vowels, so mastering vowels was a key to speech emergence. We believe that our multidisciplinary findings push back the date for that crucial step in language evolution by as much as 27 million years.

The sounds of speech

Say “but.” Now say “bet,” “bat,” “bought,” “boot.”

The words all begin and end the same. It’s the differences among the vowel sounds that keep them distinct in speech.

Now drop the consonants and say the vowels. You can hear the different vowels have characteristic sound qualities. You can also feel that they require different characteristic positions of your jaw, tongue, and lips.

So the configuration of the vocal tract—the resonating tube of the throat and mouth, from the vocal folds to the lips—determines the sound. That in turn means that the sound carries information about the vocal tract configuration that made it. This relationship is the core understanding of speech science.

After over a half-century of investigation and of developing both anatomical and acoustical modeling technology, speech scientists can generally model a vocal tract and calculate what sound it will make, or run the other way, analyzing a sound to calculate what vocal tract shape made it.

So model a few primate vocal tracts, record a few calls, and you pretty much know how human language evolved? Sorry, not so fast.

Modern human anatomy is unique

If you compare the human vocal tract with other primates’, there’s a big difference. Take a baboon as an example.

The vocal tract of a baboon has the same components – including the larynx, circled in green – as that of a person, but with different proportions. Laboratory of Cognitive Psychology and GIPSA-lab

From the baboon’s larynx and vocal folds, which is high up and close to their chin line, there’s just a short step up through the cavity called the pharynx, then a long way out the horizontal oral cavity. In comparison, for adult male humans, it’s about as far up the pharynx as it is then out through the lips. Also, the baboon tongue is long and flat, while a human’s is short in the mouth, then curves down into the throat.

So over the course of evolution, the larynx in the human line has moved lower in our throats, opening up a much larger pharyngeal cavity than found in other primates.

About 50 years ago, researchers seized on that observation to formulate what they called the laryngeal descent theory of vowel production. In a key study, researchers developed a model from a plaster cast of a macaque vocal tract. They manipulated the mouth of an anesthetized macaque to see how much the vocal tract shape could vary, and fed those values into their model. Then finally they calculated the vowel sound produced by particular configurations. It was a powerful and groundbreaking study, still copied today with technological updates.

So what did they find?

They got a schwa—that vowel sound you hear in the word “but”—and some very close acoustic neighbors. Nothing where multiple vowels were distinct enough to keep words apart in a human language. They attributed it to the lack of a human-like low larynx and large pharynx.

As the theory developed, it claimed that producing the full human vowel inventory required a vocal tract with about equally long oral and pharyngeal cavities. That occurred only with the arrival of anatomically modern humans, about 200,000 years ago, and only adults among modern humans, since babies are born with a high larynx that lowers with age.

This theory seemed to explain two phenomena. First, from the 1930s on, several (failed) experiments had raised chimpanzees in human homes to try to encourage human-like behavior, particularly language and speech. If laryngeal descent is necessary for human vowels, and vowels in turn for language, then chimpanzees would never talk.

Second, archaeological evidence of “modern” human behavior, such as jewelry, burial goods, cave painting, agriculture and settlements, seemed to start only after anatomically modern humans appeared, with their descended larynxes. The idea was that language provided increased cooperation which enabled these behaviors.

Rethinking the theory with new evidence

So if laryngeal descent theory says kids and apes and our earlier human ancestors couldn’t produce contrasting vowels, just schwa, then what explains, for instance, Jane Goodall’s observations of clearly contrasting vowel qualities in the vocalizations of chimpanzees?

But that kind of evidence wasn’t the end of the laryngeal descent idea. For scientists to reach agreement, especially to renounce a longstanding and useful theory, we rightly require consistent evidence, not just anecdotes or hearsay.

One of us (L.-J. Boë) has spent upward of two decades assembling that case against laryngeal descent theory. The multidisciplinary team effort has involved articulatory and acoustic modeling, child language research, paleontology, primatology and more.

One of the key steps was our study of the baboon “vowel space.” We recorded over 1,300 baboon calls and analyzed the acoustics of their vowel-like parts. Results showed that the vowel quality of certain calls was equivalent to known human vowels.

A schematic comparing the vocal qualities of certain baboon calls (orange ellipses) with selected vowel sounds of American English, where the phonetic symbols / i æ ɑ ɔ u / represent the vowels in beat, bat, bot, bought, boot. Louis-Jean Boë, GIPSA-lab

Our latest review lays out the whole case, and we believe it finally frees researchers in speech, linguistics, primatology, and human evolution from the laryngeal descent theory, which was a great advance in its time, but turned out to be in error and has outlived its usefulness.

Speech and language in animals?

Human language requires a vocabulary that can be concrete (“my left thumbnail”), abstract (“love,” “justice”), elsewhere or elsewhen (“Lincoln’s beard”), even imaginary (“Gandalf’s beard”), all of which can be slipped as needed into sentences with internal hierarchical grammar. For instance “the black dog” and “the calico cat” keep the same order whether “X chased Y” or “Y was chased by X,” where the meaning stays the same but the sentence organization is reversed.

Only humans have full language, and arguments are lively about whether any primates or other animals, or our now extinct ancestors, had any of language’s key elements. One popular scenario says that the ability to do grammatical hierarchies arose with the speciation event leading to modern humans, about 200,000 years ago.

Speech, on the other hand, is about the sounds that are used to get language through the air from one person to the next. That requires sounds that contrast enough to keep words distinct. Spoken languages all use contrasts in both vowels and consonants, organized into syllables with vowels at the core.

Apes and monkeys can “talk” in the sense that they can produce contrasting vowel qualities. In that restricted but concrete sense, the dawn of speech was not 200,000 years ago, but some 27 million years ago, before the time of our last common ancestor with Old World monkeys like baboons and macaques. That’s over 100 times earlier than the emergence of our modern human form.

Researchers have a lot of work to do to figure out how speech evolved since then, and how language finally linked in.

Thomas R. Sawallis is a Visiting Scholar in New College, University of Alabama

Louis-Jean Boë is a Chercheur en Sciences de la parole au GIPSA-lab

This story originally featured on The Conversation.