Meta's new AI tool can predict protein shapes

Proteins are an essential part of keeping living organisms up and running. They help repair cells, clear out waste, and relay correspondences from one end of the body to the other.

There’s been a great deal of work among scientists to decipher structures and functions of the proteins, and to this end, Meta’s AI research team announced today that they have used a model that can predict the 3D structure of proteins based on their amino acid sequences. Unlike previous work in the space, such as DeepMind’s, Meta’s AI is based on a language learning model rather than a shape-and-sequence matching algorithm. Meta is not only releasing its preprint paper on this research, but will be opening up both the model and the database of proteins to the research community and industry.

First, to contextualize the importance of understanding protein shapes, here’s a brief biology lesson. Certain triplet sequences of nucleotides from genes are translated by a molecule in the cell called a ribosome into amino acids. Proteins are chains of amino acids that have assorted themselves into unique forms and configurations. An emerging field of science called metagenomics is using gene sequencing to discover, catalog, and annotate new proteins in the natural world.

Meta’s AI model is a new protein-folding approach inspired by large language models that aims to predict the structures of the hundreds of millions of protein sequences in metagenomics databases. Understanding the shapes that these proteins form will give researchers clues about how they function, and what molecules they interact with.

“We’ve created the first large-scale characterization of metagenomics proteins. We’re releasing the database as an open science resource that has more than 600 million predictions of protein structures,” says Alex Rives, a research scientist at Meta AI. “This covers some of the least understood proteins out there.”

Historically, computational biologists have used evolutionary patterns to predict the structures of proteins. Proteins, before they’re folded, are linear strands of amino acids. When the protein folds into complex structures, certain sequences that may appear far apart in the linear strand could suddenly be very close to one another.

“You can think about this as two pieces in a puzzle where they have to fit together. Evolution can’t choose these two positions independently because if the wrong piece is here, the structure would fall apart,” Rives says. “What that means then is if you look at the patterns of protein sequences, they contain information about the folded structure because different positions in the sequence will co-vary with each other. That will reflect something about the underlying biological properties of the protein.”

Meanwhile, DeepMind’s innovative approach, which first debuted in 2018, relies chiefly on a method called multiple sequence alignment. It basically performs a search over massive evolutionary databases of protein sequences to find proteins that are related to the one that it’s making a prediction for.

“What’s different about our approach is that we’re making the prediction directly from the amino acid sequence, rather than making it from this set of multiple related proteins and looking at the patterns,” Rives says. “The language model has learned these patterns in a different way. What this means is that we can greatly simplify the structure prediction architecture because we don’t need to process this set of sequences and we don’t need to search for related sequences.”

These factors, Rives claims, allow their model to be speedier compared to other technology in the field.

How did they train this model to be able to do this task? It took two steps. First, they had to pre-train the language model across a large number of proteins that have different structures, come from different protein families, and are taken all across the evolutionary timeline. They used a version of the Masked Language Model, where they blanked out portions of the amino acid sequence and asked the algorithm to fill in those blanks. “The language training is unsupervised learning, it’s only trained on sequences,” Rives explains. “Doing this causes this model to learn patterns across these millions of protein sequences.”

Then, they froze the language model and trained a folding module on top of it. In the second stage of training, they use supervised learning. The supervised learning dataset is made up of a set of structures from the protein databank that researchers from across the world have submitted. That is then augmented with predictions made using AlphaFold (DeepMind’s technology). “This folding module takes the language model input and basically outputs the 3D atomic coordinates of the protein [from the amino acid sequences].” Rives says. “That produces these representations and those are projected out into the structure using the folding head.”

Rives imagines that this model could be used in research applications such as understanding the function of a protein’s active site at the biochemical level, which is information that could be very pertinent for drug development and discovery. He also thinks that the AI could even be used to design new proteins in the future.

Win the Holidays with PopSci's Gift Guides

GM is killing Cruise robotaxis GM is killing Cruise robotaxis

Why are crocodiles so bumpy? A dermatological mystery has been solved Why are crocodiles so bumpy? A dermatological mystery has been solved

Photoshop’s new Super Resolution feature makes images bigger, not blurrier Photoshop’s new Super Resolution feature makes images bigger, not blurrier

Tech-savvy fashion forecasters already know what you’ll be wearing in two years Tech-savvy fashion forecasters already know what you’ll be wearing in two years

This AI-powered hearing aid improves as you wear it This AI-powered hearing aid improves as you wear it

This Billie Eilish cover is unlike any other (because it’s made by Google’s AI) This Billie Eilish cover is unlike any other (because it’s made by Google’s AI)

Photoshop’s Neural Filters can alter people’s expressions in convincing—and nightmarish—ways Photoshop’s Neural Filters can alter people’s expressions in convincing—and nightmarish—ways

Photoshop will soon use AI to add dramatic skies to your boring photos Photoshop will soon use AI to add dramatic skies to your boring photos

Artificial intelligence creates better, faster MRI scans Artificial intelligence creates better, faster MRI scans

Watch a computer clobber a human pilot in a simulated fighter jet duel Watch a computer clobber a human pilot in a simulated fighter jet duel

Here’s How Virtual Reality Could Help Doctors Treat Cancer Here’s How Virtual Reality Could Help Doctors Treat Cancer

Senator Tammy Duckworth describes the day her Black Hawk was shot down Senator Tammy Duckworth describes the day her Black Hawk was shot down

DJI Air 2S drone: A pro-grade aerial camera for under $1,000 DJI Air 2S drone: A pro-grade aerial camera for under $1,000

This remote-controlled landmine will attack tanks from above This remote-controlled landmine will attack tanks from above

Music videos are marketing vaping products to teens—and it’s working Music videos are marketing vaping products to teens—and it’s working

Twitter is about to take a big step toward a password-free future Twitter is about to take a big step toward a password-free future

These augmented-reality goggles let soldiers see through vehicle walls These augmented-reality goggles let soldiers see through vehicle walls

Set up remote access to your computer so you can use it from anywhere Set up remote access to your computer so you can use it from anywhere

Share

Win the Holidays with PopSci's Gift Guides