Last year, as pandemic lockdown restrictions hit the US, new bird enthusiasts flocked to the free Merlin Bird ID app. The app, which comes from the Cornell Lab of Ornithology, previously offered ways for users to identify a mystery bird near them through descriptions or a photo. Earlier this summer, it received an even cooler feature: the ability to recognize a bird based on a short audio clip of its song, chirp, or call.
Starting in March 2020, the Merlin team saw an uptick in the number of app downloads, a trend that’s persisted. “Not only were we getting more downloads, but the number of active users has continued to grow,” says Drew Weber, the project coordinator for Merlin. This spring, 1.2 million people (and counting) were on Merlin. “People are downloading it, getting into birds, and they’re still into birds this year, even though the realities of lockdown and such are changing,” he says. “It seems like it piqued their interest, and kept their interest.”
This comes at a time when bird news is flying along, especially in New York City, where select rare birds have risen to cult celebrity status. Barred and snowy owls decorate the pages of The New York Times, and a Mandarin duck was written up in New York Magazine in 2018 like it was the next feathered influencer to know.
Sound ID, which debuted on Merlin in June, has already received positive responses from the birding community. The new sound identification feature now accompanies the machine-learning-based photo ID tool, which became available to users around 2015.
“Prior to the release of sound ID, I think our biggest piece of feedback was ‘I thought you could identify birds by sound with this app!’ or, ‘where is the Shazam for birds,’ so it’s really cool to actually have that delivered to people,” says Weber.
There are a few other options for identifying birds by sound, including Bird Genie, Song Sleuth, and Smart Bird ID. Many use machine learning-based algorithms, but the accuracy of the results can vary due to background noise and individual variations in bird calls.
Merlin is already an established bird guide app. It offers a walkthrough process for regular by-sight identification that’s useful for beginner birders in addition to its more advanced tools.
Here’s how the Merlin sound ID works
Through Merlin, birders can turn on their phone’s microphone and have it listen to what’s around them. The app will then surface suggestions of what birds were singing or calling. The audio that is picked up by the app is also turned into a visual pattern representation called a spectrogram, which captures the amplitude, frequency, and duration of sounds.
“As soon as you have an image of an actual bird in a tree or of an audio signature in the form of a spectrogram, you can use robust computer vision tools to start building a model to recognize those patterns,” says Grant Van Horn, the lead researcher on the Merlin project.
Besides sound ID, the other ways to use the app to identify a bird are by manually inputting its physical characteristics and by uploading a photo.
A massive feat of citizen science
Artificial intelligence systems need data, and of course, that data needs to exist in the first place.
In this case, the project took some serious citizen science. The photo ID feature, and the newer sound ID option, would not have been possible to build without the Lab of Ornithology’s Macaulay Library database, which contains nearly 30 million archived and annotated photos of birds and over 1.1 million sound bites uploaded by the birding community.
A team then went to work converting the media into useful tools. They started building the Merlin photo ID component in 2012, the same time when advancements were being made in computer vision. “We knew that if we could get data together, we could utilize these tools to build a pretty useful feature that would allow someone to snap a photo and have the computer tell them what was in that photo,” says Van Horn. By 2015, the lab was able to let citizen scientists upload photos and audio to the growing collection. Since the photo ID component rolled out on the app, it’s been continuously improved upon with the addition of more photo samples and the expansion of species coverage in new regions of South America, Africa, Asia, and Europe. “Machine learning only works well if you’ve got this nice foundation of data that you can build on top of,” Van Horn explains.
The source of audio clips and photos that have been going into the Macaulay Library originate from another program run by the lab called eBird, which was launched in 2002. The eBird app allows citizen scientists and local organizations all over the world to log and share bird sightings, including with scientists who study and plot bird populations.
“Because we’ve aggregated this data over so much time, we have a really good sense of, if you’re in New York City on July 19, which species you’re likely to encounter,” Van Horn says. “That kind of information really helps us on the sound ID and the photo ID because it immediately lets us take the 450 species problem for sound ID, 8,000 species for photo ID, and it helps us narrow it down to 40 species that are really under consideration here.”
The audio ID component progress was slower than the image ID “just because the routine of going out and recording bird calls just isn’t quite as popular as going and photographing them,” says Van Horn. “But certainly over the last three years or so, North America has been pretty densely covered with audio recordings.”
Around this time last year, the team decided that it had enough audio data potentially to build and launch the sound identification feature for popular species in US and Canada. They started combing through all the data and selecting the species.
However, the problem of background noise still remains a challenge for the engineering team. To solve that problem, they turned to existing audio data. These audio datasets could be of traffic scenes, urban environments, and machine noises—in other words, normal sounds that are not from birds. “We would convert those audio into spectrograms and use those as negative examples of ‘this is not a bird; anytime you see this, you shouldn’t be reporting bird species,’” Van Horn adds. “It’s a balance of building up a high quality avian dataset as well as bolstering up a nice dataset of non-avian noises that we can show the machine and teach it what birds don’t sound like.”
[Related: How to start birding in any US city]
Then came more work. Since the success of the project hinged on the dataset being high quality, this meant that Weber and Van Horn had to organize and recruit members with high expertise from the birding community to help them go through the raw audio files in the database and label the species that were in the recordings.
“In the build up of our dataset for the initial release, I think we put in about 2,000 hours of annotating, drawing where the bird was singing, where the various birds are singing,” Weber says. “It was a mostly volunteer effort from a lot of the same folks who are entering these eBird data and observations.”
When the app was first launched in 2014, it only had the most common birds of the US and Canada. In 2016, the first international tags were released, starting with Mexico, Costa Rica, and expanding across Europe, Australia, New Zealand, Africa and parts of Asia. “We still see that about 75 percent of our new and active users are in the US and Canada,” says Weber, but a growing number of new species are being logged all across the world.
As users globally continue to submit sightings to the eBird database, the new sightings get incorporated into the Merlin app and the research team’s understanding of what species occur when and where. “We are constantly updating the photos and the sounds that we feature in the app for each species so we can constantly improve the content that we show with Merlin,” Weber adds.
Weber notes that some of the most surprising feedback they’ve received are from users that are hard of hearing. “They’re just absolutely thrilled by the live view and the spectrogram that can visualize the bird song,” he says. “Whether it’s someone who’s always [been] hard of hearing, or someone who’s getting older and losing the high pitches, a lot of people are really excited about being able to in some sense recover some of that hearing loss.”
The team is still working to refine the app and integrate feedback from users. By working together with regional communities and organizations, Van Horn thinks they can build a diverse array of useful tools that help people have a more engaging experience outdoors, particularly with local birds. “This is a success story of humans and machines,” says Van Horn. “The humans play a huge piece of this puzzle.”