New map reveals human genome's most mysterious bits

Last week, a team of researchers from more than two dozen research institutions announced a breakthrough in the 30-year effort to create a high-quality sequence of the human genome. Although the first draft from the Human Genome Project was produced 20 years ago, nearly 8 percent of human DNA remained mysterious. Now, almost every part of the genome—all but the Y chromosome—has been decoded.

The newly mapped regions will give geneticists a window into stretches of the genome once described as “junk DNA.” Those regions are now understood to be fundamental to evolution, embryo growth, and the ways cells replicate and die.

“We’ve discovered things are a lot more diverse than we could have ever appreciated,” says Rachel O’Neill, a comparative biologist at the University of Connecticut, and coauthor on the sequence. The previous results were “like studying culture and music and language for the planet of Earth, and ignoring all of Africa.”

All of the outstanding sections remained mysterious because they’re composed of complex repeating sequences of DNA. A stretch of this genetic material could consist of a thousand-letter long sequence that repeated thousands of times. “They evolved by repetition,” Benedict Paten, a computational biologist at the University of California, Santa Cruz and co-author on the new sequence, told Popular Science in an interview earlier this year.

The extreme repetition made sequencing especially troublesome. Although genetic sequencing is much faster and cheaper than it was when the human genome was first reconstructed two decades ago, the most common technology now involves reading short fragments of the genome. Those fragments are assembled into the full picture by matching where the DNA sequences overlap. Piecing together repetitive sections is a bit like working on a jigsaw puzzle of a herd of zebras. The researchers had to develop tools for reading extremely long strands of DNA and code new algorithms to complete the final picture.

Historically, geneticists have described large stretches of the genome, including the newly sequenced regions, as “junk.” The vast majority of genetics research has focused on genes, the comparatively tiny stretches of DNA that are transcribed into RNA, and then translated into proteins, biology’s molecular workhorses. Junk DNA, which makes up 98 percent of the entire genome, doesn’t get translated into functional proteins. “If you’re only interested in that gene to protein pathway, everything else is junk,” says O’Neill.

But over the past two decades, biologists have realized that the information contained in that “junk” is foundational to life as we know it—akin to learning that the shelves of a library are also covered in writing. “One man’s junk is another man’s treasure,” says O’Neill. “I’m on the treasure end.”

The human genome is assembled into 46 chromosomes, X-shaped “libraries” of knotted up DNA. The new sequence tackled three parts of those chromosomes: telomeres, the chromosomes’ “end caps” that prevent DNA from wearing away; centromeres, dense knots of DNA in the middle of each chromosome that are critical for DNA replication; and DNA in the “arms” of chromosomes that are used to build protein factories called ribosomes.

New research produced a high-quality sequence of DNA in the telomere, centromere, and repetitive patches in several chromosome “arms.” *Zaleskyphoto/Deposit Photos*

Although this DNA doesn’t produce proteins, it can make RNA. And while RNA is usually thought of as a pure information carrier, it can also be an active participant in the cell, latching on to other molecules and facilitating chemical reactions. (This is why some evolutionary biologists believe that the earliest organisms were made entirely of RNA, which would have contained both blueprint and tools for replication.)

In the centromeres, the RNA helps wrangle chromosomes as they replicate. Without the RNA, the entire genome falls apart. DNA in the “end cap” telomeres produce repetitive strings of RNA, which work to hold the ends together, and appear to play a role in the cellular aging process by patching together telomeres as they wear away.

DNA that guiding the construction of the ribosome, called rDNA, could have a similarly wide-reaching role. High-school biology textbooks generally describe ribosomes as dumb machines that read RNAand spitting proteins out. But, explains Maria Barna, a geneticist at Stanford University who was not involved in the new sequence, different ribosomes appear to have slightly different functions.

The key is again RNA. Four “species” of ribosomal RNA, coded by rDNA, are woven into the structure of the factory. Different stretches of rDNA produce slightly different subspecies. “What’s emerging right now from the telomere-to-telomere data is that there’s tremendous rDNA diversity,” she says. “Almost 25 percent of rDNA can be variable.”

Barna says that not all of those variants will actually make it into ribosomes. But the diversity could play a role in everything from giving neurons ultra-precise ribosomes for building specialized proteins to allowing a tumor to grow. “We now have a first-glance catalog of the possibility of these variants that could be applied to normal cellular differentiation, as well as to disease states,” says Barna.

The structure of the repeating sequences probably matters. The repetitions can evolve extremely quickly, jumping from chromosome to chromosome, or moving whole genes around. And even closely related organisms can have extremely divergent centromeres and telomeres—which suggests that they play a role in the emergence of new species.

“It’s a paradox,” says O’Neill. “One of the most conserved functions that we share from beasts to humans is also one of the most divergent parts of the chromosome.” The open question is how such foundational parts of all biology can also be so flexible.