bears
The #brownbears of Instagram. Instagram
SHARE

Consider Instagram hashtags. When someone uploads a photograph to the Facebook-owned platform, they can add a hashtag. That could be something like #love, #fashion, or #photooftheday—those were the top three hashtags of last year. While those tags illustrate abstract concepts, there are plenty of more concrete descriptors our there, like #brownbear, which, unsurprisingly, is full of ursine pics.

But while hashtags are a good way for someone to see millions of #travel photos in one place, Facebook used those labeled photographs to do something else: train their image-recognition software, which is a kind of artificial intelligence called computer vision in which you teach a computer to recognize what’s in an image.

In fact, they used some 3.5 billion Instagram photos (from public accounts) and 17,000 hashtags to train a computer vision system that they say is the best one that they have created yet.

Facebook’s CTO, Mike Schroepfer, announced the research today at the company’s developer conference, F8, calling the results “state of the art.”

Bad supervision

To understand why this is an interesting approach, it helps to know the difference between “fully supervised” and “weakly supervised” training for artificial intelligence systems. Computer visions systems need to be taught to recognize objects. Show them images that are labeled “bear,” for example, and they can learn to identify images it thinks are bears in new photos. When researchers use photographs that humans have annotated so that an AI system can learn from them, that’s called “fully supervised.” The images are clearly labeled so the software can learn from them.

“That works really well,” says Manohar Paluri, the computer vision lead at Facebook’s Applied Machine Learning group, which carried out the research along with another division at the social network called Facebook AI Research. The only problem with that approach is that the images need to be labeled in the first place, which takes work by humans.

“Going to billions [of labeled images] starts becoming infeasible,” Paluri adds. And in the world of artificial intelligence, the more data that a system can learn from, generally the better it is. And diverse data is important too—if you want to teach an AI system to recognize what a wedding looks like, you don’t want to just show it photographs of weddings from North America, but instead from weddings across the world.

Enter “weakly supervised” learning, in which the data hasn’t been carefully labeled by people for the purpose of teaching an AI. That’s where all those billions of Instagram photos came into play. Those hashtags become a way of crowdsourcing the labeling job. For example, the tag #brownbear, combined with the similar tag #ursusarctos, becomes the label for images of bears. Instagram users became the labelers.

But that kind of data is messy and imperfect, and thus noisy. For example, Paluri points out that someone who takes an Instagram photo near the Eiffel Tower may still give it that tag, but the tower itself isn’t visible. That label still makes sense in the human context, but doesn’t do much good for a simple-minded computer. In another scenario, a birthday party scene that has cake in it might not be labeled #cake, which is also not helpful if you’re trying to train a computer what that dessert looks like.

Facebook
An example of the kind of image classification Facebook’s image recognition systems could do in the past. Facebook
Facebook
The new system is more precise: it can recognize an animal as not just a bird, but as an eastern meadowlark. Facebook

It worked anyway

But the end result is that despite the noise in the original data, Paluri says that ultimately, it worked very well. Measured by one benchmark, the system—trained on those billions of Insta pics—was on average about 85 percent accurate. Paluri says that it is the strongest computer vision system that Facebook has yet made.

If you use Facebook, you know that it can recognize faces in the photos you upload and suggest tagging them with (hopefully) the right name. That’s an example of computer vision—in this case, face recognition. But under the hood, Facebook uses computer vision to identify other things besides faces, like visual content (such as pornography) that’s not allowed on the platform.

Paluri says that the new, Instagram-trained technology is already being used to help them flag objectionabe content in photos that shouldn’t be on the site. When it comes to recognizing “objectionable content,” he says, they’ve already noticed “significant improvement in accuracy.”