Machine Learning Is Helping Us Find The Genetics Of Autism

Princeton researchers are working smarter, not harder

The genetic cause of autism spectrum disorder is notoriously hard to research. Genetic markers for the disorder are tough to match from patient to patient because they’re so rare—one of the most common genetic signifiers is only found in less than one percent of those diagnosed with autism. Even when genetic anomalies are found, they must be checked against family members genomes to ensure it’s not attributable to a more commonly inherited mutation that doesn’t cause disease.

Researchers at Princeton and the Simons Foundation turned the traditional approach on its head, teaching a machine learning algorithm to look for the genetic relationships that could cause autism. The algorithm scoured a digital network of the human genome’s interactions, looking for relationships and connections that are similar to those in previously-known markers for autism. The research shines a light on how the disorder hides within our genome, highlighting 2,500 genes ripe for further research.

“We don’t just say that [a gene] has a 90 percent chance of being autism-associated, because we have the network we can actually say, ‘This is how it’s connected to autism,'” said Olga Troyanskaya, co-author of the paper published in Nature Neuroscience.

The results aren’t immediately useful for identifying the disorder in patients. Instead, they could make finding more autism-causing genes faster and less expensive. Now that scientists have a better idea of where where to look, they can selectively sequence parts of the genome that correlate to the disorder. Those who can parse through seemingly-dense genomic data can access the team’s results online.

She explains that these interactions are like electrical circuits—every piece needs to work for the whole process to function.

Troyanskaya sets up a scenario where two genes need to bind in order to activate a third gene, one that’s important for brain development.

“If something goes wrong, like you don’t have one of those genes to bind together, then they’re not going to go on and bind the third gene, and now you broke this little circuit,” Troyanskaya says.

That’s how the algorithm works: it analyzes at how these little circuits break, the nature of the genes that are affected, and how they interact with genes around them, and then finds similar potential scenarios throughout the genome. The assumption, however, is that there’s a pattern.

But our knowledge of genomic markers is limited—we know 65, but estimates say there can be 400-1000 markers still undiscovered. And of those 65, only 19 are what the Princeton team considered “gold-standard,” or meeting an incredibly high probability of causing autism.

Machine learning algorithms, while able to comb through massive troves of data faster than humanly possible, lack human ability to learn with just a few examples. With such complex information, 19 examples is an extremely small batch for the algorithm to learn from. So Troyanskaya and her team used a trick to maximize that data, by providing counter-examples of genetic disease to the algorithm. By telling the algorithm to ignore other genetic disorders, it can hone the search for autism-specific relationships.

While the algorithm did all the legwork, Troyanskaya says this research was possible because of the team’s gene-interaction network, which was published in 2015. The network contains predictions for how 25,825 genes relating to tissue, like the brain, work together. It’s more than a list of genes—it’s thousands of matrixes that represent each gene behaves in the brain.

Moving forward, the team is looking at how this same technique could be applied for mapping individual patient genomes.