The Drama Over Project Encode, And Why Big Science And Small Science Are Different

In a novel form of peer review, a biologist has given an colorfully fiery critique of a genome research consortium. Here's why.

If every new abstract read like Dan Graur’s latest contribution, people wouldn’t need any TLC reality shows–they could get all the drama they’d want from research papers. Graur’s new paper, a takedown of a much-ballyhooed genomics project, contains some of the most fiery language ever to appear in the staid, typically decorous world of scientific literature.

On the phone, Graur is just as frank: “Their data analysis is obscene,” he said. “It was horrible. This is not science.”

Here’s the story: The Encyclopedia of DNA Elements (ENCODE) project was a five-year effort involving hundreds of people who sought to unravel the functions of so-called non-coding, or “junk,” sections of the human genome. When it was published last September, scientists who led the project claimed it would upend decades of assumptions about how the genome works, causing textbooks to be rewritten. Most of the genome is biologically active, they said–it’s functional. But many evolutionary biologists were peeved by this characterization and the loose definition of the word function.

“We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten,” Graur and his colleagues write in the journal Genome Biology and Evolution.

Graur, a professor of molecular evolutionary bioinformatics at the University of Houston, is the lead author of a seething response to a global consortium of genetics and bioinformatics researchers that provoked plenty of frustration from evolutionary biologists.

“Big science, like the Human Genome Project, should publish data. Small science should do the analysis.”One of the key complaints: That Encode authors are computational scientists, not biological scientists. “They’re computer jocks,” as Graur put it. “Big science, like the Human Genome Project, should publish data. Small science should do the analysis.”

In evolutionary biology, “function” is a loaded word–an organ, a piece of DNA or a cell can perform a function that’s selected for, and a function that’s causal. Selected functions are things that confer an evolutionary advantage, while causal functions don’t, to put it very simply. In the paper, Graur uses an example of a human heart: Its evolutionary function is pumping blood through your body. A causal function is its capacity for making noise. Incidentally, that’s useful to your doctor or personal trainer, but isn’t the heart’s primary function.

If you think of the human genome like a textbook, you can think of Encode as the footnotes, intended to provide insight into what all the nucleotides are doing. It annotates all of the 3.2 billion combined A, C, G and T nucleotides that make up genes and their regulatory sections. In doing this, Encode papers defined function in a loose way, to include all the things that DNA does. The research says the vast majority of our DNA participates in at least one “biochemical event” in at least one cell type, and considers this a function. But that definition is liberal at best, and it wasn’t even the project’s goal, said Mike White, a systems biologist at Washington University in St. Louis who has criticized Encode’s hype but (unlike Graur) has praised its value to science. Rather, it was to comprehensively measure the biochemical features of the genome, and let scientists have a go with those measurements.

“Those features are going to help other scientists actually discover functional regions,” he said.

Genome Gradient

Biochemical functions include a wide range of activities, like DNA sequences that are transcribed into RNA; regions that are bound up by regulatory proteins, which might switch genes on or off; regions that are not wrapped up tightly in chromatin, which packages DNA in a cell; and so on. (For a very detailed description of these biological functions activities, read this thorough analysis by chemist and blogger Ashutosh Jogalekar at Scientific American.) The point is, while it’s true that these are “functions” in the sense that they’re doing something, the thing they are doing is not necessarily meaningful.

Here is how Graur explained it on the phone: “Have you ever stepped on a piece of chewing gum? It binds to the sole of your shoe. But this is not the function of chewing gum, to bind to the shoe on a hot day.”

White said these activities are useful to measure because they can be associated with functions–just not necessarily associated with them. Establishing function is difficult and requires a lot more work, he said.

In his own lab, White is studying a specific regulatory protein that binds to DNA in about 10,000 places on the genome, and helps switch a gene on or off. He is trying to determine whether that binding event has to do with the gene activation, and how the proteins find their way along the genome as they’re floating around in a cell. Each of the 10,000 binding events might be functional, there might be non-specific “noisy” DNA binding as the protein takes a shotgun approach, or maybe something else is going on.

“For that question, the Encode data is useful. I have a list of regions of the genome that are bound by regulatory proteins, and I can test them and thereby gain some insight into, what is it about certain DNA features that enable them to activate genes, and don’t lead others to activate genes?” he said. “Those are the kinds of discoveries that will come out of the Encode data.”

Other biologists are also glad to use the data, though they still express frustration with how it was presented. Mick Watson, director of ARK-Genomics at the University of Edinburgh’s Roslin Institute, wrote in a blog post that he disagrees with Encode’s definitions.

“However, I do appreciate that science, like many other disciplines, requires, and benefits from, people with opposing views. Your view of functionality certainly opposes mine; however, at the very least, what you have achieved is to stimulate debate on the topic, which is of benefit to everyone,” he wrote, adding that Graur’s paper sets a bad example for young scientists.

Graur has several other problems with the research, not the least of which is its data analysis. He lamented that many of the analysts and researchers in Encode are computer scientists, not biologists. He said he felt like he had to speak up. Students, postdocs and other young researchers have since thanked him for publishing his paper, he said.

“Many people object to the tone, but actually the tone was the point. I’m a professor, I am tenured. … Sometimes you need an old card like me to do that,” he said. “Science is about presenting hypotheses and refuting them. A lot of the people who are dealing with data and analyzing data have forgotten about that.”

White agreed that some contributors lack a background in evolutionary biology, and this may have contributed to the hype–scientists were overstating their findings. It also may have contributed to the backlash and ongoing resentment, he said.

“People resent that, when you have people coming in from a different field and they start making sweeping statements about your own field, and they don’t know anything about it and the sweeping statements are wrong,” he said.

“I’m a little surprised that a paper this angry got through without changes–it was a little over the top in terms of its outright inflammatory statements–but on the other hand, I understand the anger. A lot of us were really angry,” he said. “Now we have to see, is the data going to be useful? Are they going to start publishing real studies with this? We’ll see.”