Researchers at the University of Texas Austin have developed a breakthrough “semantic decoder” that uses artificial intelligence to convert scans of the human brain’s speech activity into paraphrased text. Although still relatively imprecise compared to source texts, the development represents a major step forward for AI’s role in assistive technology—and one that its makers already caution could be misused if not properly regulated.
First published on Monday in Nature Neuroscience, the team’s findings detail a new system that integrates a generative program similar to OpenAI’s GPT-4 and Google Bard alongside existing technology capable of interpreting functional magnetic resonance imaging (fMRI) scans—a device that monitors how and where blood flows to particular areas of the brain. While previous brain-computer interfaces (BCIs) have shown promise in achieving similar translative abilities, the UT Austin’s version is reportedly the first noninvasive version requiring no actual physical implants or wiring.
In the study, researchers asked three test subjects to each spend a total of 16 hours within an fMRI machine listening to audio podcasts. The team meanwhile trained an AI model to create and parse semantic features by analyzing Reddit comments and autobiographical texts. By meshing the two datasets, the AI learned and matched words and phrases associated with scans of the subjects’ brains to create semantic linkages.
After this step, participants were once again asked to lay in an fMRI scanner and listen to new audio that was not part of the original data. The semantic decoder subsequently translated the audio into text via the scans of brain activity, and could even produce similar results as subjects watched silent video clips or imagined their own stories within their heads. While the AI’s transcripts generally offered out-of-place or imprecisely worded answers, the overall output still successfully paraphrased the test subjects’ inner monologues. Sometimes, it even accurately mirrored the audio word choices. As The New York Times explains, the results indicate the UT Austin team’s AI decoder doesn’t merely capture word order, but actual implicit meaning, as well.
[Related: Brain interfaces aren’t nearly as easy as Elon Musk makes them seem.]
While still in its very early stages, researchers hope future, improved versions could provide a powerful new communications tool for individuals who have lost the ability to audibly speak, such as stroke victims or those dealing with ALS. As it stands, fMRI scanners are massive, immovable machines restricted to medical facilities, but the team hopes to investigate how a similar system could work utilizing a functional near-infrared spectroscopy (fNIRS).
There is, however, a major stipulation to the new semantic decoder—a subject must make a concerted, conscious effort to cooperate with the AI program’s goals via staying focused on their objectives. Simply put, a busier brain means a more garbled transcript. Similarly, the decoder tech can also only be trained on a single person at a time.
Despite these current restrictions, the research team already anticipates the potential for rapid progress alongside misuse. “[F]uture developments might enable decoders to bypass these [privacy] requirements,” the team wrote in its study. “Moreover, even if decoder predictions are inaccurate without subject cooperation, they could be intentionally misinterpreted for malicious purposes… For these and other unforeseen reasons, it is critical to raise awareness of the risks of brain decoding technology and enact policies that protect each person’s mental privacy.”