AI vocal filters are here to stay

The growing attention given to AI deepfake technology in recent years has primarily focused on visual trickery. Think videos able to uncannily superimpose a person’s face onto the body of another, surreal art generation based on user suggestions, and the troublesome ethics surrounding all of these abilities. But another key method for convincing digital mimicries is only recently getting proper examination and discussion.

Vocal filters, while not necessarily new, have only recently started to be taken more seriously thanks to AI assistance. Unfortunately, it poses its own host of serious societal implications, and like the video deepfake industry, there seems to be little regulators can do to stop it.

Emerging AI tools like Koe Recast and Voice.ai are quickly honing their ability to transform audio inputs to sound like virtually anyone one wants if provided enough source material to analyze. In some cases, these programs only need a clip between 15 and 30 seconds to generate convincing imitations. Although Koe Recast is only in its private alpha testing phase, examples are already available depicting a brief second clip of Mark Zuckerberg sounding like a bass-heavy narrator, a woman, and even a high-pitched anime character.

“My goal is to help people express themselves in any way that makes them happier,” Koe Recast’s Texas-based creator, Asara Near, told Ars Technica in an interview last week. Near added that he intends to eventually release a desktop app able to transform users’ voices in realtime on platforms like Discord and Zoom. When asked about the potential for bad actors to use Koe Recast for personal attacks and misinformation, Near argued that, “As with any technology, it’s possible for there to be both positives and negatives, but I think the vast majority of humanity consists of wonderful people and will benefit greatly from this.”

Critics, however, remain skeptical of trusting the general public with such potentially chaotic tools. Recently, some outsourced call center reps have also begun using AI software to erase their native countries’ accents in order to sound more “American” in an attempt to mitigate Western consumer biases. While the tool’s creators argue their invention prevents prejudice, many have countered that it simply provides a means to avoid dealing with the larger issues at hand—namely, xenophobia and racism.

Likewise, employees at some larger businesses have fallen prey to scammers asking for funds transfers and passwords while utilizing similar audio mimicry to imitate bosses. “Among the larger businesses, I think more and more of them are starting to see these because they’re really ripe targets for this kind of thing,” Kyle Alspach, a cybersecurity reporter for Protocol, explained while speaking recently on NPR’s Marketplace.

While Alspach also noted that these sorts of scams are still in their infancy, it likely won’t be long before these tactics become more commonplace, and unfortunately, harder to distinguish fact from fiction. So unfortunately, there’s simply no stopping the rapid escalation of AI-enabled visual and audio mimicry.

Win the Holidays with PopSci's Gift Guides

Deck the halls and the snack table with help from a $25 Sam’s Club membership Deck the halls and the snack table with help from a $25 Sam’s Club membership

Anglerfish use ‘extremely rare’ mechanism to control their lures Anglerfish use ‘extremely rare’ mechanism to control their lures

Artificial intelligence can take your bad whistling and make it sound like Mozart Artificial intelligence can take your bad whistling and make it sound like Mozart

Get your phone’s AI assistant to actually assist you Get your phone’s AI assistant to actually assist you

This AI can see people through walls. Here’s how. This AI can see people through walls. Here’s how.

Cirque du Soleil Now Makes You a Part of the Show In Virtual Reality Cirque du Soleil Now Makes You a Part of the Show In Virtual Reality

Lie Like A Lady: The Profoundly Weird, Gender-Specific Roots Of The Turing Test Lie Like A Lady: The Profoundly Weird, Gender-Specific Roots Of The Turing Test

The best sports bra uses non-Newtonian fluid The best sports bra uses non-Newtonian fluid

Make the most of your dual or ultrawide monitor setup Make the most of your dual or ultrawide monitor setup

West Coast states are calling in all their best planes and helicopters to fight fires West Coast states are calling in all their best planes and helicopters to fight fires

Share

Win the Holidays with PopSci's Gift Guides