The final missing piece of the human genome has been decoded

Mysteries hidden in the human Y chromosomes are now coming to light.
Purple and blue visualizations of human chromosomes.
It took roughly 100 years to fill in all the missing details of the Y chromosome. Darryl Leja, National Human Genome Research Institute, NIH

Despite its macho connotations, the Y chromosome is among the tiniest of the 46 chromosomes in the human genome. It makes up only 2 percent of a human cell’s total DNA. But because of its seemingly endless repeating bases, the Y chromosome is one of the most difficult to genetically sequence. Scientists initially believed it was nothing more than a genetic wasteland, only good for making sperm.

Yet, in reality, that’s not the case at all. As genetic technology grows more advanced, so has our understanding of the Y chromosome’s importance. Its loss in older men, for example, is associated with an increased risk of cancer and other chronic diseases. Its genes somehow play a part in multiple biological processes. But, for decades, more than half of the Y chromosome remained unsequenced, and its role in human health remained a mystery.

That age of mystery is ending. For the first time, geneticists have assembled a complete sequence of the Y chromosome. The international Telomere-to-Telomere (T2T) Consortium added data for more than 30 million new base pairs and identified 41 new protein-coding genes. Two studies published today in Nature break down those findings, explaining how this chromosome affects our reproduction, evolution, and even the gut microbiome.

[Related: What we might learn about embryos and evolution from the most complete human genome map yet]

“The complete sequence of the Y chromosome has opened up a lot of doors for the scientific community,” says Chris Lau, a professor of medicine at the University of California, San Francisco who studies the human Y chromosome but was not involved in these current studies. “We anticipate some surprises could be forthcoming, just like the time in the past we thought it was full of junk materials.”

A picture a century in the making

It took more than 100 years for biologists to construct a complete assembly of the Y chromosome’s structure, after its discovery in 1905. The first human genome was completed in April 2003, but it left behind some unknown gaps, including swathes of the Y chromosome. 

The chromosome’s repetition made it a challenge to reconstruct. It has more than a million of base pairs lined up in long repeated sequences, says Karen Miga, the associate director at the University of California, Santa Cruz Genomics Institute and co-lead of the T2T Consortium. These are known as palindromes, because they are the same from front to back. 

The Y chromosome is among the tiniest of these 46 paired structures. National Human Genome Research Institute

All chromosomes have some repeats in their genes, but the Y chromosome has an unusually high amount. Assembling these was a laborious and expensive process. “Researchers have had a hard time studying this in the past because we just didn’t have the right tools to reconstruct these really complex repeats,” Miga says. 

New advances in long-read sequencing technology and computational assembly methods made it easier to put each repetitive sequence in order. For example, the team could now identify exactly where an inversion occurs—where breaks in the DNA cause a segment to reinsert itself in reverse order—and use that technique to spot other inversions. 

Filling in millions of blanks

The new techniques added more than 30 million base pairs missing from the current Human Genome Project, for a grand total of 62,460,029 base pairs in the Y chromosome. The Y chromosome shows to have a unique organization of DNA sequences that’s strangely not seen in other chromosomes, Miga says. She believes a ton of new biology is required to understand the evolutionary reason behind this organization and how parts of the chromosome correspond to human function. 

[Related: We’re one step closer to identifying the first-ever mammals]

The research team has already made some headway in reshaping science. These newly discovered sequences corrected several mistakes and assumptions found in the human genome reference sequence. They’ve also provided new insight into the ways the Y chromosome shapes human life.

“This is an extremely important finding in the human genome field,” Lau says.

Fertility and proteins

The Y chromosome contains many genes that regulate the production of sperm. Some of these newfound repetitive genomic regions, according to Miga, play a part in that process, too. “Understanding differences that could exist between humans could really inform things like infertility and how that process is inherited across time.”

Sequencing the Y chromosome also revealed 41 new protein-coding genes, 38 of which were extra copies of a gene family called TSPY, thought to be involved in sperm production. It’s possible they are also responsible for the development of male sex characteristics, but more research is needed to determine their precise roles. 

Variation in human evolution

Commercial ancestry sites use Y chromosomes to trace paternal lineages. The new DNA sequences can further help researchers understand how humans evolved over time. In the second study, geneticists examined the Y chromosomes from 43 genetically diverse men. They found significant amounts of genetic variation between individuals. 

In some parts of the chromosome, its component parts—nucleotides—were very similar across the men. But half of gene-rich regions in the Y chromosomes had greater mutation rates carrying large inversions, at a higher rate than most other parts of the genome. These differences in genetic variation could have potentially evolved to hold some important biological function, though what that could be is unknown. 

Correcting bacterial confusion

When analyzing genetic samples, researchers often use databases to screen for sequences belonging to human DNA. If the sequences aren’t found anywhere in the current model of the human genome, scientists are likely to conclude the material belongs to bacteria. The new studies show some Y chromosome sequences, not yet entered in human databases, were mislabeled as bacteria.

Not junk after all

Geneticists will continue to mine discoveries from this treasure trove of data. Further analyses of the Y chromosome are likely to clarify the relevance of this chromosome in human health and disease.
This information “will benefit research in human evolution and migration, forensic science, and many translational applications in diagnostic and prognostic development in human diseases,” Lau says, “particularly the scientific reason for the mosaic loss of the Y chromosome in disease and cancer among others.”