Computers Can Now Generate Sounds To Fool Human Listeners

I, Foley Artist

Share

Think of all the sounds in a movie–drumming hoofbeats, zooming cars, booming thunder, heels clacking down a hall.

In movies, Foley artists will often add those sounds in after the filming, creating those noises in a studio. It’s a job that requires skill and creativity … and now, a computer can do it.

In a paper that will be presented this month at the Computer Vision and Pattern Recognition conference by researchers at MIT’s Computer Science and Artificial Intelligence Lab, the researchers describe how they created a deep-learning algorithm that can watch a silent movie and create sounds that go along with the motions on screen. It’s so good, it even fooled people into thinking they were actual, recorded sounds from the environment.

They managed to do this by recording audio and video of researchers hitting objects with a drumstick, and showing the video to the algorithm, which was able to learn what sounds went with which visuals. The resulting dataset is available for other researchers. It’s called ‘Greatest Hits’.

After learning from the video, the computer was then able to generate appropriate sound effects for a new silent movie, fooling people asked to watch videos and determine if the sounds they heard were real or computer generated. The algorithm doesn’t just choose a sound from the database, it actually synthesizes a waveform to match the video.

Foley artists shouldn’t panic though. While someday, this algorithm might be able to produce sounds based on videos for TV shows, the algorithm isn’t there yet. It still gets confused by movements that look like hits but aren’t, and it can’t yet generate ambient sounds like distant traffic or birds chirping in a garden.

There are other reasons a computer might want to know what sounds are associated with different visuals–including helping robots navigate the world.

“A robot could look at a sidewalk and instinctively know that the cement is hard and the grass is soft, and therefore know what would happen if they stepped on either of them,” Andrew Owens, lead author of the paper said. “Being able to predict sound is an important first step toward being able to predict the consequences of physical interactions with the world.”