Over the past few months, the terrorist group ISIS has shocked the world with the release of numerous violent videos, many of which prominently feature a masked man wearing all black and holding a knife. Heard speaking with a British accent, the man has been given the nickname “Jihadi John” — but the FBI says they now know who he really is. It seems likely they uncovered the man’s identity by analyzing his voice; British Ambassador to the U.S. Peter Westmacott told CNN that voice identification was being used in the investigation.
What the FBI is not disclosing is how, exactly, they did it — and when and on whom the technology might also be used.
Voice identification technology is not new or particularly groundbreaking says DeLiang Wang, who works on training neural networks in voice analysis at Ohio State University. Unlike Siri or Android, which try to understand what a person is saying, speaker identification programs look for patterns in the waveform of a person’s voice. Those patterns remain constant whether he or she is speaking English, Finnish, or random sounds. Computers compare audio samples with other recording of voices in order to match the sample with a speaker. The more voices a neural network has examined, the better it is at recognizing them in the future.
“It’s kind of like fingerprint matching,” Wang tells Popular Science.
The computer first compares a sample to every voice it knows to find a best match. Then it checks the sample against that result to see if they come from the same person.
For at least a decade, computers have beaten human ears at recognizing voices in ideal laboratory settings, Wang says. But out in the real world, voice ID suffers. Background noise, poor recording equipment, or different tones of speech can all confuse speaker recognition programs. A person can also alter their voice to fool a machine. Wang says all of these flaws show the technology has a lot of room to grow.
Of course, in order for software to detect a speaker, it has to have a recording to compare it to. “If they want to track known terrorists they have to have known databases of their voices,” Wang says.
Whether the FBI has such a database is an open question. The NSA has direct access to phone calls through security backdoors, and has used that access to record phone calls en masse according to revelations from documents leaked by Edward Snowden. The FBI has access to at least some of that data, however the agency would not comment on the use of voice identification technology for forensic purposes.