The growing attention given to AI deepfake technology in recent years has primarily focused on visual trickery. Think videos able to uncannily superimpose a person’s face onto the body of another, surreal art generation based on user suggestions, and the troublesome ethics surrounding all of these abilities. But another key method for convincing digital mimicries is only recently getting proper examination and discussion.
Vocal filters, while not necessarily new, have only recently started to be taken more seriously thanks to AI assistance. Unfortunately, it poses its own host of serious societal implications, and like the video deepfake industry, there seems to be little regulators can do to stop it.
Emerging AI tools like Koe Recast and Voice.ai are quickly honing their ability to transform audio inputs to sound like virtually anyone one wants if provided enough source material to analyze. In some cases, these programs only need a clip between 15 and 30 seconds to generate convincing imitations. Although Koe Recast is only in its private alpha testing phase, examples are already available depicting a brief second clip of Mark Zuckerberg sounding like a bass-heavy narrator, a woman, and even a high-pitched anime character.
“My goal is to help people express themselves in any way that makes them happier,” Koe Recast’s Texas-based creator, Asara Near, told Ars Technica in an interview last week. Near added that he intends to eventually release a desktop app able to transform users’ voices in realtime on platforms like Discord and Zoom. When asked about the potential for bad actors to use Koe Recast for personal attacks and misinformation, Near argued that, “As with any technology, it’s possible for there to be both positives and negatives, but I think the vast majority of humanity consists of wonderful people and will benefit greatly from this.”
Critics, however, remain skeptical of trusting the general public with such potentially chaotic tools. Recently, some outsourced call center reps have also begun using AI software to erase their native countries’ accents in order to sound more “American” in an attempt to mitigate Western consumer biases. While the tool’s creators argue their invention prevents prejudice, many have countered that it simply provides a means to avoid dealing with the larger issues at hand—namely, xenophobia and racism.
Likewise, employees at some larger businesses have fallen prey to scammers asking for funds transfers and passwords while utilizing similar audio mimicry to imitate bosses. “Among the larger businesses, I think more and more of them are starting to see these because they’re really ripe targets for this kind of thing,” Kyle Alspach, a cybersecurity reporter for Protocol, explained while speaking recently on NPR’s Marketplace.
While Alspach also noted that these sorts of scams are still in their infancy, it likely won’t be long before these tactics become more commonplace, and unfortunately, harder to distinguish fact from fiction. So unfortunately, there’s simply no stopping the rapid escalation of AI-enabled visual and audio mimicry.