Siri, Alexa, and other computerized companions are typically pretty good at understanding the strange questions we’re asking them. Now, research that was presented at the IEEE International Conference on Acoustics, Speech and Signal Processing in Shanghai this week might make it easier for machines to discern what we’re saying without even hearing us.
When used for evil, this might seem like a frightening prospect. “It’s the end of the (privacy) world as we know it…” writes KurzweilAI.net. But, on the more innocuous end of the spectrum, more advanced lip reading automation could lead to better movie dubbing, IEEE Spectrum says.
It’s no easy feat to teach a machine to read lips. Part of that is because the mouth only makes up to 14 different shapes, while those shapes produce about 50 different sounds. That means the same shape (think P or B) can produce different sounds.
The researchers, led by University of East Anglia computer scientist Helen Bear, developed a new algorithm that would help machines better differentiate between those similar shapes with different sounds. The machine was trained to recognize the differences between these sounds using video and audio recordings of 12 people speaking 200 sentences. The computer was taught to map out the multiple sounds that each mouth shape could produce. The next step, as far as we could gather, is for the machine to make a copy of each word with the different sound options (for example, was that word pridge or bridge?) and train itself to find the right word.
The result is an algorithm that is correct about 25 percent of the time, which Bear tells IEEE Spectrum is an improvement. And, considering that a previous study found that lip readers are typically correct about 50 percent of the time, the machines are catching up.