AND BIG DATA is about to get much, much bigger, as we enter an era in which digital data merges with biology. This synthesis of codes takes the abstract world of digits and brings it back into the physical world. We of course know quite a bit about how life is expressed—in the four letters of DNA, in more than 20 amino acids, in thousands of proteins. We can copy life through cloning. Now we are beginning to be able to rewrite life, not just gene by gene, but entire genomes at a time. This is the difference between inserting a single word or paragraph into a Tolstoy novel (which is what biotechnology does) and writing the entire book from scratch (which is what synthetic biology does). It is far easier to fundamentally change the meaning and outcome of a novel, seed, animal or human organ if you write the entire thing.
We’ve come a long way, very quickly, to reach this point. A decade ago, simply reading the entire life code of a single organism was a breakthrough achievement in processing enormous dollops of data. In 1999, gene sequencers could read only a few hundred base pairs of DNA at a time, so Craig Venter’s human genome project relied on shotgun sequencing: Copy portions of a genome over and over again. Break them into random pieces. Feed these into a gene sequencer. Read the output and then use a computer to compare every sequence with every other sequence, looking for overlaps. When you find an overlap, begin to build up the whole of the genome, much as one builds a brick wall, overlaying brick by brick. A nifty trick, but one that most people until then had thought to be impossible because of the staggering computations involved. Yet Venter and his team built one of the most powerful private computers in the world (in the process becoming one of the largest users of electricity in Maryland) and solved the problem. Theirs is now the standard approach to reading genomes.
But sequencing the genome was a trivial computational exercise compared with the modeling of protein-protein interactions that is being attempted today. To begin with, you have to compare 20 amino acids, instead of four DNA base pairs. And because proteins can take so many more shapes than a strand of DNA, mapping the shape of their every combination is vastly more complex. Today’s computers are barely able to deal with a few of these variables. In spite of the achievements that Moore’s Law has wrought, life-sciences data is exceeding the scope and power of all current computer capabilities and storage.
In other words, in this new era—the transition from digital code to digital-plus-life code—the capacity to generate data exceeds our capacity to store and process it. In fact, life code is accumulating at a rate 50 percent faster than Moore’s Law; it at least doubles every 12 months. Without extraordinary advances in data storage, transmission and analysis, within the next five years we may simply be unable to keep up.
Then again, there’s good reason to expect that we’ll achieve the necessary technology breakthroughs. Because there is one other, absolutely fundamental change going on in the world of Big Data. When you marry life code and digital code, the emerging applications differ from the merely digital in one revolutionary way: This software builds its own hardware. No matter how you create or program a computer, you will not come downstairs the next morning to find a thousand new computers. Life code is different. In 2008, three scientists—Venter, Hamilton Smith and John Glass—and their colleagues took a basic gene sequence from a computer, programmed robots to pick the four chemicals that make up DNA from jars, and assembled the world’s largest organic molecule. They then developed techniques to insert this new molecule into a cell. Bottom line, they programmed a cell to become a different species. Some called it the world’s first synthetic life-form. It is really the first fully programmable life-form. And it reproduces.
Programmable cell platforms are like computer chips. They could eventually be designed to help create or do anything, if you figure out the right code for what you wish to make. I’m a cofounder and investor in a Venter spinout company, Synthetic Genomics, that’s attempting to program algae to generate gasoline (with Exxon), extract gas from coal (with BP), rapid-
prototype vaccines (with Novartis), and breed faster-growing plants (with Plenus). Life programming may also solve the problem of how to store gargantuan data sets. All digital data can be coded into life-forms, and all life-forms can be coded as digital data. In theory, this means you could eventually store, and copy, all the words and images from every issue of the New York Times in the gene code of a few bacteria.
I was blown away by the Big Data parade at TED 2011. But a new era of digital life code promises to dwarf today’s most glorious data achievements.single page
The incredible innovations, like drone swarms and perpetual flight, bringing aviation into the world of tomorrow. Plus: today's greatest sci-fi writers predict the future, the science behind the summer's biggest blockbusters, a Doctor Who-themed DIY 'bot, the organs you can do without, and much more.