Storing data on DNA may not be practical, but it's possible

Will DNA be the next storage medium for information? Pixabay

Our DNA is the coding that programs everything about us. Now it’s being used to encode videos and operating systems. Dense and durable, DNA could store 200 petabytes (200,000,000,000,000,000 bytes or around 200,000 laptops) in one gram, as shown by a study in this week’s Science. Experts say DNA could be a promising storage solution in the era of information explosion—if only cost were not an issue.

Harvard researchers stored a 50,000-word book in 2012. And around the same time, the European Bioinformatics Institute stored things including Shakespeare’s sonnets and an audio clip of the “I Have a Dream” speech. But what makes this latest study special is that it uses a coding technique that not only makes the storage 60 percent more efficient, but also more robust against errors, says co-author Yaniv Erlich, a computer science professor at Columbia University and a Core Member at the New York Genome Center.

Erlich says once you figure out how to do the coding, or what DNA to synthesize or write the data into, the rest is simple: a company can take care of making the actual DNA. “Data is just a stream of 0s and 1s. The DNA is basically just a sequence of four nucleotides. You need to map these nucleotides into these pairs of bits,” says Erlich.

But DNA synthesis can be a tricky business. Sri Kosuri, lead author of the 2012 Harvard paper and now a biochemistry professor at UCLA (who was not involved in the latest study) says that “the big thing is that about five percent or less of the time, a random piece of DNA will just get lost.”

“It’s hard to synthesize, and it’s hard to read out for some reason,” says Kosuri, “so if you can use a code that’s robust to [such a] loss, DNA storage [would] become a lot easier and cleaner.”

And that is what Erlich’s paper shows, says Kosuri—the method tolerated imperfections in the DNA synthesis process. The researchers achieved that by applying a special coding technique that usually transforms digital information sent over channels subject to connection dropouts, such as a Youtube video streamed on your phone. Erlich says their results suggest that this coding technique can deal with DNA that has much lower quality than what they used in the study. “We can still get the results we need and retrieve information correctly.”

The team successfully stored and recovered an operating system in DNA, which is “an elegant thing,” says Robert Grass, a chemical engineer at ETH Zurich who was not involved in the study. “As a daily user, I know it has to be error-free,” says Grass.

Erlich says they put in an operating system on purpose: “We wanted to show that we are not afraid to put stuff that could be totally screwed if we don’t recover the file perfectly.”

Kosuri agrees that the paper’s methods are effective. “I think they did the right sets of experiments to test the algorithm. They picked the right algorithms, and it looked to work.”

But just like five years ago when they first stored data in DNA, says Kosuri, it’s still ridiculously expensive. “The expenses probably are on order of a millionfold too expensive to be competitive for anything,” says Kosuri.

In the study, the researchers spent nearly $10,000 to encode just two megabytes. That’s the data storage of an old 3.5-inch floppy disk for the cost of about 10 Macbook Airs.

Although the new method could reduce some synthesis cost by allowing us to use “really crappy DNA,” says Kosuri, the costs would still be obscenely impractical.

Regarding cost, Grass says the cost of DNA synthesis is a bigger drawback than that of sequencing. “Cost of sequencing is still too high for application, but is continuing to come down, with things like nanopore that you can plug into your computer and sequence DNA,” says Grass. “Cost of synthesis is further back.”

And we’d need to have additional measures to stabilize DNA, according to Grass, if we want to store information in DNA for hundreds of years. Unlike genetic material locked away in bones and fossils, Grass explains, free DNA is not stable. If you just put it in the lab, it will only last a year before information starts getting lost.

Despite the cost, density and feasibility still make DNA storage much more attractive than other new technologies with similar features. “The density of information storage here is on par with stuff like positioning individual atoms on a surface,” Kosuri says, refering to the 1989 study where IBM was spelled with 35 Xenon atoms.

“You can do stuff like that, but it’s done near absolute zero, done in the vacuum and with [great] imaging,” says Kosuri. “That’s the competitive technology. [DNA storage] is much more practical than that, but it is much less practical than other things we are [normally] thinking of” like a floppy disk or memory stick.

DNA also allows you to store in 3D. Holography and racetrack memory are other methods for this. But DNA is still “multitudes higher in density,” according to Kosuri.

So after this paper in Science, Kosuri says, the only thing that’s preventing DNA storage from working at scale is its cost. “There were all the other questions about codes, would they work, would you have them reliable, all these questions seem answered by this paper. The only question that remains in my mind is the cost issue.”

No longer studying DNA storage himself, Kosuri remains optimistic about the technique’s development in future. “A millionfold sounds like a lot, but in the last 15 years, we’ve got about a millionfold [decrease] in cost already. Not just sequencing, the synthesis part has dropped quite a bit,” says Kosuri.

Erlich’s vision is that in the future one can create “a process dedicated for DNA storage that we can take quick and dirty DNA synthesis” to further reduce the cost.

“We can eat these half-baked cakes and get to the data correctly, following our coding strategy,” says Erlich. “That is what we hope to experiment with in the future.”

The current study was a good start. What the researchers stored and recovered were all sensitive to errors: an operating system, a computer virus, a $50 Amazon gift card, an 1895 French film (“Arrival of a train at La Ciotat”), a Pioneer plaque, and a 1948 study by information theorist Claude Shannon.

“People would say, ‘how could you put a video into DNA?'” says Erlich. “I had to explain that to my six-year-old son.”

“[It’s] capturing the imagination of people. This is what we want to do, right? We want to engage the general public [with] the fun side of science.”