What’s in a genome? The quest to decipher human difference
Racial categories are crude maps imposed on human biological variation. How do scientists square them with genetics?
This article was originally featured in Undark.
Tina Lasisi was 19 years old, sitting in a lecture hall at the University of Cambridge, when she found the scientific question that would occupy more than a decade of her life.
The instructor that day had presented a classic discovery from the study of human evolution: People with many ancestors who lived near the equator, where there’s more ultraviolet radiation, tend to have darker skin than those whose ancestors lived near the poles.
For Lasisi, the biracial daughter of a Bulgarian mother and a Nigerian father, the detail felt like a revelation. Suddenly skin color wasn’t about race, exactly; it was about UV light, and about how much of it her ancestors had encountered as they moved through the world. But then, almost immediately, came another question: “What about my hair?”
Lasisi had stumbled onto a puzzle — a set of puzzles, really. Human scalp hair varies by color, thickness, and structure; unlike the hair of nearly any wild mammal, it often curls. But scientists have little idea how this panoply of hair came to be.
Over the years, as she pursued a Ph.D. in biological anthropology, Lasisi amassed a collection of human hair. (Today, she estimates, she has more than 200 samples, some kept in small tubes in a walk-in lab refrigerator, others embedded in moldable plastic.) She devised rigorous methods for measuring the curvature and shape of each hair fiber. Now a postdoctoral researcher at the University of Southern California, Lasisi also studies the genetics of hair. Which genes affect the kind of hair a person has? Why is there variation at all?
That work also puts Lasisi in uncomfortable company. For centuries, some scientists have studied variation in order to sort people into groups and enforce racist hierarchies. The study of hair has played a role in that effort: In the early 1900s, for example, the German anthropologist Eugen Fischer used swatches of synthetic hair to classify mixed-race people in colonial Africa. (The German regime murdered tens of thousands of people in what is now Namibia; Fischer went on to become a Nazi.)
In response to that racist legacy, many anthropologists and geneticists have stressed that racial categories are the creations of human beings, not facts of nature. In the now-classic phrasing, race is a social construct, not biological.
But Lasisi and many other researchers say that formulation, while well-intentioned, falls short. “There’s this void that’s been left by the just genuine desire to undo a racist legacy by saying, ‘Race isn’t real. Race isn’t biology. Race is a social construct,’” she said. “All of those things have been baked into so many syllabi over the last couple of decades. There are entire generations of students I meet now who will parrot these things, but they will not know what it means.” Nor, she added, does it help people make sense of the biological variation that does exist among humans.
Consider hair. Looked at from one angle, it defies racial categories: tightly coiled curls can signal recent ancestry in Africa, or in Papua New Guinea; a blonde could trace their light hair to northern Europe, or the South Pacific. From another angle, though, hair can seem like proof that there’s something biological about race. Tell an American about a random person’s racial or ethnic identity, and they can make an educated guess, at least, about the color or texture of their hair.
It may be more helpful to put it like this: Race is one crude, fraught way of describing biological variation among humans. “The way I come at it is, the variation exists,” said Jada Benn Torres, a genetic anthropologist at Vanderbilt University. “It’s the cultural meanings that we attach to that variation that will create the different racial categories.” Those categories, she added, are fluid. A person who identifies as White in the United States of 2022 may not have been considered White in 1922. Someone’s who’s called Black in the U.S. today may not be understood as Black in Brazil. In that sense, race is a social construct.
That has not stopped people from trying to describe and understand the variation. In the past few decades, vast new troves of genetic data have made it possible to examine the small differences among human beings in minute detail, and to develop many approaches to categorizing people. Some of the categories they cook up look nothing like U.S. racial categories. Other times, researchers analyze that data in a way that produces clusters that look, roughly, like race. And it’s possible to take existing social racial categories, however incoherent, and look for differences among them.
Today, geneticists are gazing into a cloud of trillions of datapoints, searching for patterns. How they ask questions — what, exactly, they are looking for — affects what, exactly, comes out.
A person’s DNA can tell many different stories, all at once. Most immediately, the collection of all of your genetic material — your genome — is spliced together from the DNA of your biological parents: one half from each, sprinkled with a few mutations that are unique to you. Zoom back a couple generations, and that genome looks like more of a patchwork, cobbled together from the DNA of eight great-grandparents. Go back still further, and you have thousands of ancestors, most of whom have passed on nothing to you at all, and it becomes possible to visualize each snippet of the genome as having undergone its own tangled journey, passing through bodies and across continents before landing in your genetic code. At that scale, DNA can offer clues about where a person’s ancestors lived centuries ago. And it contains faint records of an ancient world: the migrations of people across the planet; the mixing and dividing of communities; long-ago pandemics and famines.
Human history is a story of churn. Homo sapiens emerged somewhere in Africa, probably around 300,000 years ago. Eventually, some groups of people wandered off that continent. Before long — relatively speaking — humans occupied places as far-flung as Alaska and Tasmania. They settled in the thin air of the Tibetan Plateau, pushed deep into the Americas, and, more recently, launched wooden boats into the open ocean to populate Polynesian isles. Long before the tumult of the post-1492 colonial empires, migrations swept across continents. Some people whose ancestors had left Africa millennia before eventually came back: Recent genetic research suggests flows of migration from Eurasia into Africa, and from the Indonesian archipelago to the island of Madagascar.
Traces of those migrations are recorded in human DNA — and, specifically, in the millions of spots on the genome where the genetic code can vary from person to person. At a particular spot, one person may have a different entry in the genetic ledger compared to someone else — the nucleic acid cytosine, for instance, instead of adenine, the rough equivalent of swapping out one letter in a word. Often, these tiny differences have no apparent effect. Others may contribute, in a small way, to obvious differences: for example, why one person has black hair, and another brown.
As human history unspooled, that genetic potpourri underwent changes. Some of that was simply random. Take, for example, an imaginary population of people who live beside a mountain range. Half of them have version A of a particular gene, and half carry version B. One day, a small group of people, nearly all of them happening to carry version B, decide to leave home and cross the mountain range. They establish a whole new society there, and the far side of the mountains becomes chock-full of people with version B — just by sheer luck.
Natural selection drove changes, too. Some people had genetic variants that allowed them to flourish in particular circumstances, and they passed those genes to their kids. For example, populations living at high altitude — in the Himalayas, the Andes, and the East African highlands — picked up changes that helped them thrive with less oxygen. Near the Arctic, humans experienced selection for paler skin, seemingly because it makes it easier to produce vitamin D in a place with weak sunlight.
Disasters also left their mark on the genome. Recently, a team of researchers extracted DNA from the interred bones of medieval Europeans and found that certain gene variants linked to immune system development became more common in Europe after the devastation of the Black Death. People who carried those versions of the genes, it seems, had been likelier to survive that pandemic.
As these kinds of changes accumulate over long stretches of time, isolated populations of plants, animals, and other organisms sometimes evolve into distinct groups, or even different species. In the 19th and 20th centuries, some scientists argued that this process had produced racial groups, too — that the human species consisted of long-isolated populations, with large genetic differences among them. Experts now say that’s simply not the case. “There’s actually very little differentiation between human populations,” said Joseph Graves, Jr., an evolutionary biologist at North Carolina Agricultural and Technical State University. “And what differentiation there is, is continuous and not discrete.”
For one thing, human beings emerged in Africa pretty recently, at least by the evolutionary clock. There just hasn’t been much time for pronounced genetic differences to emerge. Plus, rather than sitting in isolated populations and slowly evolving into unique types, people kept traveling and mingling. “In most places, you have massive movements of people, intermixing, changing over time,” said Agustín Fuentes, a biological anthropologist at Princeton University. “There isn’t this one deep thread of, like, ‘Everyone from Europe has been the same, doing this for 10,000 years, everyone from Africa has been doing this for 10,000 years.’”
The pattern “is differentiation followed by contact,” said University of Chicago population geneticist John Novembre — a constant dance of division and new fusions. By one estimate, every single person alive today shares at least one ancestor who lived within the past 3,500 years.
Still, people do not choose their partners randomly. They’re likelier to have kids with people who live close to them, or who are in their extended families.
As a result, there is a link between genes and geography. The people on the far side of the mountain range, to go back to that earlier example, mostly have version B of the gene, and the people on the near side have an even mix of A and B. If a person picked at random carries version B, it’s likelier they come from the far side of the mountain. Repeat that procedure for a bunch of locations on the genome, and it becomes possible to make a rough guess about where in the world someone’s recent ancestors lived.
Starting in the early 2000s, researchers began to debut powerful new computational tools to suss out those patterns.
Arguably the most influential such tool was cooked up by a trio of young geneticists, on a single September day in 1998, at a workshop in Cambridge, England. The computer program took genetic data and looked for clusters, meaning groups of people whose genes are slightly more similar to each other than they are to people outside the group. It was called STRUCTURE.
Human beings share more than 99.9 percent of their DNA with one another. But researchers have long obsessed over the myriad points on the genome where people do vary. At the time STRUCTURE was invented, geneticists had begun amassing vast new libraries of genetic data, drawing on samples collected from around the world. That information, they hoped, would offer clues about human history, and about small genetic differences that might be relevant for medical care. But, the sociologist Dorothy Roberts has argued, innovations in genetic research could sometimes seem like a fresh expression of the impulse to slice human beings up into old categories — “a new racial science.”
Perhaps unintentionally, STRUCTURE seemed poised to feed that impulse. In the very first published application of the tool to a human population, the researchers used it to draw bright lines between DNA taken from a group of White northern Europeans and a group of Black central and south Africans. Soon after, the geneticist Noah Rosenberg and several colleagues applied STRUCTURE to genetic data from 1,056 samples, taken from people around the world. When Rosenberg and his colleagues told the program to sort people into five clusters, it produced groups centered in Africa, Eurasia, East Asia, Oceania, and the Americas.
To some observers, the clusters looked a lot like, well, race. The study “concluded that people belong to five principal groups corresponding to the major geographical regions of the world,” wrote science journalist Nicholas Wade in The New York Times, adding that “these regions broadly correspond with popular notions of race.” (Wade expanded on the idea in a 2014 book that biologists widely described as inaccurate, intellectually incoherent, and racist.)
Many researchers — including Rosenberg, now at Stanford University — caution that it’s a mistake to equate the categories that STRUCTURE generates with racial groupings. Tweaking the number of clusters the computer model spits out will lead to different results. So can adjusting the specific samples that go into the model. The collection that Rosenberg’s team used contains DNA that’s specifically curated from far-flung Indigenous groups, many living in remote areas. If the researchers had selected 1,056 other subjects — say, people living in major cities, or biracial French citizens, or simply a random assortment of individuals — the clusters could have looked different.
“There’s no such thing as an absolute truth when it comes to groupings,” said Genevieve Wojcik, a geneticist at Johns Hopkins University’s Bloomberg School of Public Health. “It’s all about the data that you have in hand, who you choose to include, the methods that you use, and the assumptions you make with those methods.”
“It’s not this worldwide fundamental truth,” she added, “that people sort of cluster.”
Indeed, as the use of clustering tools has grown widespread in the field, results have varied. Researchers use a mix of scientific judgment and statistical analysis to pick which number of clusters seems like the most appropriate number to examine. One study of human genetic diversity published in 2009, drawing on a different set of samples, identified 14 clusters — six of which corresponded to populations with recent roots in Africa. (The continent contains much more genetic diversity than any other region of the world because human beings have lived there for so long.) More recently, a group of researchers applied similar tools to more than 30,000 samples from people living in the New York City area, yielding 17 rough-edged clusters, with plenty of gradation in between.
Some researchers have urged caution when using these tools. A 2018 paper offered scientists “a tutorial on how not to overinterpret” data from STRUCTURE and a related algorithm. In August, geneticist Eran Elhaik — a professor at Lund University, in Sweden, with a reputation as a provocateur — published a paper describing another common tool for visualizing clusters as “the Rorschach of population genetics.” The tool, he added, is “almost entirely open to manipulation and consequent interpretations.”
Experts also suggest that the image of distinct clusters can overstate the differences between groups. Even within a set of people who can trace their recent ancestry to the same part of the world, there’s far more genetic variation among group members than between them and members of other groups.
In certain respects, Rosenberg recently argued in a paper, quoting an old statement from the journal Nature, it might be accurate to say that “two random individuals from any one group are almost as different as any two individuals from the entire world.”
Racial categories are rough maps imposed on the tangle of human variation — “lines in the sand at the beach,” according to Wojcik. But those categories exist in the world because people make them up and enforce them. And, as a result, it becomes possible to search for genetic differences among socially constructed groups.
Indeed, it’s possible to search for differences among any socially constructed cluster of people, and perhaps find something, if only by chance. Deborah Bolnick, an anthropological geneticist at the University of Connecticut, brought up an imaginary set of people — some of whom have purple-dyed hair, and some who do not. “If we searched hard enough, we might be able to find one point in the DNA that everybody who has purple hair has, and everybody who just hasn’t dyed their hair purple doesn’t have,” she said. “So that would be a genetic difference between these two groups. Is that meaningful?”
Some racial groups can correspond, if only roughly, with the geographic distribution of people’s recent ancestors across the world. And, because there’s a link between genes and geography, racial categories can capture some of the small genetic differences that may have accumulated between groups of people who lived thousands of miles apart.
Actually exploring those differences is fraught — in part because the questions are often far from innocent. For decades, a small but vocal movement has argued that there are important biological differences between racial groups, especially related to intelligence and aggressive behavior. Those conclusions tend to be based on thin evidence or outright pseudoscience, and to recapitulate longstanding racist stereotypes.
At the same time, some scientists argue, there are well-intentioned ways to explore those differences. “There are genetic differences between socially categorized groups,” said Akinyemi Oni-Orisan, a genetics researcher at the University of California, San Francisco, and the lead author on a 2021 paper calling for the continued use of racial categories in some medical genetics research. Acknowledging those small genetic differences, he suggested, is not incompatible with a commitment to racial justice. “I am a geneticist. I know there are genetic differences between populations based on ancestry,” he said. “I don’t think I’m a race essentialist.”
Where do those differences crop up? One approach is to explore those points on the genome, called loci, that seem to have been subject to recent natural selection, and that may differ from place to place. That signal shows up in perhaps predictable spots: specifically, “loci that affects things like skin pigmentation, immune system biology, pathogen resistance, metabolisms of, particularly, new dietary sources,” said Novembre, the Chicago geneticist. In other words, as people encountered new pathogens and new foods, they adapted — and those adaptations can now show up as slight average genetic differences among socially defined groups.
Even if they were to exist, figuring out differences relevant to traits like intelligence would be much, much harder. Under the best of circumstances, the relationship between genetics and the environment — between nature and nurture — is complex. Seemingly simple traits, like hair color and height, can be influenced by hundreds or thousands of different points on the genome. The problem becomes immeasurably harder with something as slippery and hard-to-define as intelligence.
And while there’s a long, troubled history of scientists trying to link race with concepts like intelligence, many experts say there’s no particular reason to assume that such differences exist. Some also question the motives of those who seek to explore it. “The more interesting question to me is, why are we asking that question in the first place?” said Benn Torres, the Vanderbilt anthropologist. “If you’re interested in intelligence or capacity for intelligence, why are we asking that question? To what end?”
Today, some researchers are debating whether it’s useful to use big categories at all. One goal is to find ways of thinking about human variation that don’t rely on such categories. “Whenever you’re defining a population, you’re always forcing a discretization,” said Mashaal Sohail, a population geneticist at the Center for Genomic Sciences at the National Autonomous University of Mexico. In other words, categories always oversimplify.
As an example, Sohail brought up a kind of common genetic study that, for technical reasons, begins by looking at each subject’s racial identity, and then breaking them up into groups based on that — one pool of people with European ancestry, one pool of those with South Asian ancestry, and so forth. Once the samples are sorted into groups, the geneticists do their study. (That sorting is rough; sometimes, one geneticist said, researchers just jettison samples that come from biracial people, unsure where to put them.)
Sohail and other researchers are excited about ways of doing those studies that don’t require breaking up the original dataset. Instead, the researchers look at how related everyone is to everyone else in their sample — how much of the genetic code they share. After all, these big ancestry groupings are just rough measures of genetic similarity among people. Relatedness does that too, Sohail said, only without artificially sorting people into groups. The approach, said Wojcik, can sometimes allow researchers to use bigger datasets, helping them get better results.
Lasisi, the scalp hair researcher, sees those as promising developments. “We’re now getting to the point where people are pioneering these statistical methods, these data visualization methods, these ways of mathematically understanding relatedness in this way,” she said. “It’s like, I’m not going to try and give it a shape. I’m just going to embrace it in its multivariate, expansive complexity.”
UPDATE: In describing a type of hair that can be suggestive of ancestry in more than one geographic region, including Papua New Guinea and Africa, an earlier version of this article used a term that some readers may have found inappropriate. The text has been modified in deference to these sensibilities.
LONG DIVISION is an ongoing journalistic project by Undark Magazine, published by the Knight Science Journalism Program at MIT, that examines the fraught legacy of race science.