Genetic Data Will Take Up More Space Than YouTube In 10 Years

A classic big data problem

Stuart Caie, CC BY 2.0

Your little genes are about to take up a lot more space. In the next 10 years, genomic data could generate anywhere between 2 and 40 exabytes per year—at a minimum, that’s more than two million times what your home computer can hold. A study approximating the overwhelming quantity of genetic data to come was published this week in PLOS Biology.

As researchers parse out exactly how our genes are related to our health, more people will be taking them. The researchers estimate that one billion people will have their genomes sequenced by 2025. With the technology and systems currently in use, one person’s fully sequenced genome takes up about 100 gigabytes of space, and the amount of data that genomics produces every day doubles every seven months. It doesn’t take much mathematical prowess to calculate just how crazy the amount of genetic data could become, and despite the efforts of researchers and private companies alike, we simply don’t have the data processing software to be ready for the genetic revolution.

Right now, the field of astronomy, YouTube, and Twitter are coping with similar big data problems. So far all the data of human genomics is still only generating about a quarter of YouTube’s data every year. But by 2025, the study authors estimate that genetic data will be just as large—and need computing power to match.

With that much data, genetics researchers will need better ways to acquire, store, distribute, and analyze data, the study authors write. Some organizations like the New York Genome Center have tried keeping their own internal database, prioritizing the files they use most often, the Washington Post reports. Amazon and Google has invested in cloud computing for genetic data, which seems like the most likely route, study author Michael Schatz said. And though the study authors don’t have specific suggestions for the data-processing method that they think will work best, they do write: “Now is the time for concerted, community-wide planning for the ‘genomical’ challenges of the next decade.”