Meta's new AI tool is a key part of machine vision

In a blog post this week, Meta AI announced the release of a new AI tool that can identify which pixels in an image belong to which object. The Segment Anything Model (SAM) performs a task called “segmentation” that’s foundational to computer vision, or the process that computers and robots) employ to “see” and comprehend the world around them. As well as its new AI model, Meta is also making its training dataset available to outside researchers.

In his 1994 book, The Language Instinct, Steven Pinker wrote “the main lesson of 35 years of AI research is that the hard problems are easy and the easy problems are hard.” Called Moravec’s paradox, 30-odd years later it still holds true. Large language models like GPT-4 are capable of producing text that reads like something a human wrote in seconds, while robots struggle to pick up oddly shaped blocks—a task so seemingly basic that children do it for fun before they turn one.

Segmentation falls into this looks-easy-but-is-technically-hard category. You can look at your desk and instantly tell what’s a computer, what’s a smartphone, what’s a pile of paper, and what’s a scrunched up tissue. But to computers processing a 2D image (because even videos are just series of 2D images) everything is just a bunch of pixels with varying values. Where does the table top stop and the tissue start?

Meta’s new SAM AI is an attempt to solve this issue in a generalized way, rather than using a model designed specifically to identify one thing, like faces or guns. According to the researchers, “SAM has learned a general notion of what objects are, and it can generate masks for any object in any image or any video, even including objects and image types that it had not encountered during training.” In other words, instead of only being able to recognize the objects it’s been taught to see, it can guess at what the different objects are. SAM doesn’t need to be shown hundreds of different scrunched up tissues to tell one apart from your desk, it’s general sense of things is enough.

You can try SAM in your browser right now with your own images. SAM can generate a mask for any object you select by clicking on it with your mouse cursor or drawing a box around it. It can also just create a mask for every object it detects in the image. According to the researchers, SAM is also able to take text prompts—such as: select “cats”—but the feature hasn’t been released to the public yet. It did a pretty good job of segmenting the images we tested out here at PopSci.

A visualization of how the Segment Anything tool works. *Meta AI*

While it’s easy to find lots of images and videos online, high-quality segmentation data is a lot more niche. To get SAM to this point, Meta had to develop a new training database: the Segment Anything 1-Billion mask dataset (SA-1B). It contains around 11 million licensed images and over 1.1 billion segmentation masks “of high quality and diversity, and in some cases even comparable in quality to masks from the previous much smaller, fully manually annotated datasets.” In order to “democratize segmentation,” Meta is releasing it to other researchers.

Some industry applications for the new AI tool. *Meta AI*

Meta has big plans for its segmentation program. Reliable, general computer vision is still an unsolved problem in artificial intelligence and robotics—but it has a lot of potential. Meta suggests that SAM could one day identify everyday items seen through augmented reality (AR) glasses. Another project from the company called Ego4D also plans to tackle a similar problem through a different lens. Both could one day lead to tools that allow users to follow directions along with a step-by-step recipe, or leave virtual notes for your partner on the dog bowl.

More plausibly, SAM would also have a lot of potential uses in industry and research. Meta proposes using it to help farmers count cows or biologists track cells under a microscope—the possibilities are endless.

Win the Holidays with PopSci's Gift Guides

Break up with Microsoft 365—get the lifetime version that pays for itself Break up with Microsoft 365—get the lifetime version that pays for itself

Locate your wallet faster than Santa finds cookies — this tracker is only $27 Locate your wallet faster than Santa finds cookies — this tracker is only $27

Artificial intelligence is everywhere now. This report shows how we got here. Artificial intelligence is everywhere now. This report shows how we got here.

Facebook changes its name as it pushes toward a digital reality future Facebook changes its name as it pushes toward a digital reality future

These volunteers are filling in missing pieces of the world map, and helping humanity at the same time These volunteers are filling in missing pieces of the world map, and helping humanity at the same time

Plot twist: MoviePass is returning from the dead next month Plot twist: MoviePass is returning from the dead next month

Yelp will flag anti-abortion pregnancy centers in its local listings Yelp will flag anti-abortion pregnancy centers in its local listings

Privacy concerns over period-tracking apps are valid, Mozilla report finds Privacy concerns over period-tracking apps are valid, Mozilla report finds

Whistleblower claims Twitter is lying about user privacy, bots, security, and more Whistleblower claims Twitter is lying about user privacy, bots, security, and more

Apple finally extends home repairs to (some) MacBook owners Apple finally extends home repairs to (some) MacBook owners

Share

Win the Holidays with PopSci's Gift Guides