These glasses can pick up whispered commands

It's like a tiny sonar system that you wear on your face.

By Charlotte Hu

Posted on Apr 8, 2023 7:00 AM EDT

3 minute read

silent speech-recognizing glasses — They may look like ordinary glasses but they're not. Cornell University

These trendy-looking glasses from researchers at Cornell have a special ability—and it doesn’t have to do with nearsightedness. Embedded on the bottom of the frames are tiny speakers and microphones that can emit silent sound waves and receive echoes back.

This ability comes in handy for detecting mouth movements, allowing the device to detect low-volume or even silent speech. That means you can whisper or mouth a command, and the glasses will pick it up like a lip reader.

The engineers behind this contraption, called EchoSpeech, are set to present their paper on it at the Association for Computing Machinery Conference on Human Factors in Computing Systems in Germany this month. “For people who cannot vocalize sound, this silent speech technology could be an excellent input for a voice synthesizer,” Ruidong Zhang, a doctoral student at Cornell University and an author on the study, said in a press release. The tech could also be used by its wearers to give silent commands to a paired device, like a laptop or a smartphone.

[Related: Your AirPods Pro can act as hearing aids in a pinch]

In a small study that had 12 people wearing the glasses, EchoSpeech proved that it could recognize 31 isolated commands and a string of connected digits issued by the subjects with error rates of less than 10 percent.

Here’s how EchoSpeech works. The speakers and microphones are placed on different lenses on different sides of the face. When the speakers emit sound waves around 20 kilohertz (near ultrasound), it travels in a path from one lens to the lips and then to the opposite lens. As the sound waves from the speakers reflect and diffract after hitting the lips, their distinct patterns are captured by microphones and used to make “echo profiles” for each phrase or command. It effectively works like a simple, miniaturized sonar system.

Through machine learning, these echo profiles can be used to infer speech, or the words that are spoken. While the model is pre-trained on select commands, it also goes through a fine-tuning phase for each individual that takes every new user around 6 to 7 minutes to complete. This is just to enhance and improve its performance.

[Related: A vocal amplification patch could help stroke patients and first responders]

The soundwave sensors are connected to a micro-controller with a customized audio amplifier that can communicate with a laptop through a USB cable. In a real-time demo, the team used a low-power version of EchoSpeech that could communicate wirelessly through Bluetooth with a micro-controller and a smartphone. The Android phone that the device connected to handled all processing and prediction and transmitted results to certain “action keys” that let it play music, interact with smart devices, or activate voice assistants.

“Because the data is processed locally on your smartphone instead of uploaded to the cloud, privacy-sensitive information never leaves your control,” François Guimbretière, a professor at Cornell University and an author on the paper, noted in a press release. Plus, audio data takes less bandwidth to transmit than videos or images, and takes less power to run as well.

See EchoSpeech in action below: