How Supercomputing Is Cracking The Mysteries Of Human Origins
National Science Foundation
SHARE

A Texas supercomputer capable of 9.6 quadrillion operations per second has solved a thorny problem in genetics, by looking at the bones of a young boy who died 24,000 years ago in Mal’ta in south-central Siberia.

Existing genetic models have suggested that modern Europeans share DNA with 3 different groups: blue-eyed, swarthy hunter-gatherers who arrived in Europe some 40,000 years ago; a second group of light-skinned, brown-eyed farmers from the Near East who migrated about 7,000 years ago; and a third mystery group who arrived more recently to share their genes. But no one knew who this “ghost population” was.

By plugging the ancient boy’s genomic data into the 9.6 petaflop “Stampede” supercomputer at the University of Texas at Austin, senior co-author David Reich of Harvard and his team were able to confirm a theory that the boy’s group of “ancient North Eurasians” were indeed the missing population.

Who was the “ghost population” inhabiting early Europe?

“They’ve been building these very complicated models of interbreeding and population divergence,” says Joshua Schraiber, a post-doctoral fellow in genome sciences at the University of Washington, who performed genomic computer analysis on 9 ancient humans and cross-referenced it with a genomic database of 2,345 modern Europeans known as POPRES.

Gleaming Slabs of Data

How do you go from a 24,000-year-old bone to a supercomputer-ready data set so large that it would take a week to download?

The first step requires drilling a hole in the ancient bone, and hoping that the powder holds enough viable ancient DNA to analyze. DNA degrades quickly in hot and wet conditions, and it’s no accident that the ancient DNA samples in the study came from cold-weather locations in Germany, Luxembourg, Sweden, and Siberia. (In fact, scientists prefer to store extracted DNA in a freezer at a temperature of -80 degrees Celsius.)

“We have all these really cool samples from Loschbour (Luxembourg), and the Stuttgart cave in Germany, and the Mal’ta, they have a high percentage of DNA from the individual who died,” says Schraiber. “You test bones until you find one that has a lot of endogenous DNA, and when you find one of those, you have a beer, because you’re happy. You do get lucky sometimes.”

A scientist drills into a <em>Homo heidelbergensis</em> bone in the hopes of extracting usable ancient DNA.

Ancient Bone

A scientist drills into a Homo heidelbergensis bone in the hopes of extracting usable ancient DNA.

Having so many samples means that software can easily sort the ancient human DNA from the very different-looking fungal or bacterial interlopers.

High-throughput sequencing machines have transformed genetic analysis since the draft Neanderthal genome was first sequenced in 2010. “[T]he ability to do this with tons and tons and tons of molecules at once, [means that], even if it’s just 1 percent ancient human DNA in a particular sample, there are a ton of samples. So it’s 1 percent of a very large number. And you can actually reconstruct a genome from that,” Schraiber says.

But high-throughput sequencing machines will typically sample a given nucleotide between 10 and 30 times, just to make sure that no errors occur, which results in a mountain of data. The researchers found it quickest to mail two-terabyte hard drives back and forth rather than to try to send the files over the web.

That’s where the supercomputer came in. Schraiber’s job was to shoehorn the massive, variably-formatted data sets into a DNA analysis program called “Beagle.” Then he had to search for statistically significant evidence of inter-relatedness between the ancient and modern humans.

A full human genome has about 3 billion base pairs, with millions of sites varying between individuals. Because the genome data for each individual needed to be compared to every other individual, Schraiber had to use a kind of algorithm that computer scientists usually like to avoid if at all possible. The number of computer operations grew quadratically over time—for every N samples of test data, there needed to be N^2 operations. Schraiber and Beagle used up to 100 GB of RAM at time, running the program for days.

Did the Ghosts Come From Another Continent?

In spite of all the computing firepower and sophisticated population divergence modeling, the team’s eureka moment of nailing down the third ancestral population required a stroke of luck as well.

As they were working, says Schraiber, Reich and his Harvard colleague Iosif Lazaridis used a tentative model of the “ghost population” that included DNA sequences that looked very similar to those of some Native Americans.

“David and Iosif noticed that things fit better” mathematically if something close to the Native American genome was one of the ancestral populations of modern Europeans.

Around the same time, in November 2013, a team led by scientists in Copenhagen published a paper about the genome of the Mal’ta boy, and concluded that he shared DNA heritage with Native Americans.

Once the Mal’ta boy DNA was in the model, the team had the match, with results published in Nature in September. Modern Europeans shared at least some DNA with this group of Northern Eurasians, themselves closely related to ancestral Native Americans, who migrated across the frozen land bridge to the Americas about 15,000 years ago. The ancient North Eurasians were not only ancestors of modern Native Americans but provided up to 20 percent of the DNA in modern Europeans as well.

Unlocking the Past

Additional studies are afoot to figure out how and when the “ghost population” migrated to Europe, and possible answers are expected next year.

The powerful combination of state-of-the-art DNA extraction, high-throughput sequencing machines, and abundant supercomputing power is creating a vast trove of data about human descent. It’s also making possible discoveries about the distant past once thought out of our reach.

In recent weeks, a team at the Centre for GeoGenetics at the University of Copenhagen reconstructed the DNA of a 37,000-year-old man from Kostenki in Southern Russia, the oldest European genome assembled to date. And in October, a team at the Max Planck Institute in Leipzig, Germany, sequenced endogenous DNA of a 45,000-year-old early human from Western Siberia named Ust’-Ishim, by far the oldest genetic record of an early human ever created.

For now.

How Supercomputing Is Cracking The Mysteries Of Human Origins

Stampede Supercomputer