Consumer DNA tests can’t tell you much, but they sure can get your relatives arrested
It's time to think about who has your data.
The suspected Golden State Killer’s apprehension last week caught a lot of people off guard. Prior to the big reveal that detectives were able to trace the alleged serial rapist and killer through his DNA, many hadn’t seriously considered the ramifications of handing over their genetic material to private companies. Now that the (again, suspected, but not convicted) Golden State Killer’s cousin’s data was supposedly able to help crack the long-unsolved case, though, people have had to start asking themselves just how much information they and others have willingly given to companies in the form of DNA samples (not to mention to Facebook). And how much should they have given?
History is peppered with examples of DNA tracing gone wrong. Take the phantom female serial killer whose genetic material turned up at dozens of crime scenes across Europe. The matches were so consistent, police were sure it had to be a prolific murderess. In fact, the DNA was so perfectly matched because it belonged to a woman who worked at the cotton swab factory where police got their supplies (the swabs weren’t certified for DNA collection—oops!).
One Houston man spent four years in prison after DNA evidence seemed to tie him to a rape. It was only after his mother saw a report on errors in crime lab analysis that an attorney reassessed the data and realized that the technician had made fundamental errors in analyzing the samples.
This is not to say that the alleged killer in question is innocent; he’ll get a trial and it will be up to the jury to evaluate the evidence at hand. It’s quite possible that online genome sharing has just helped bring a vicious rapist and killer to justice. But the news is a good reminder that DNA analysis is an unwieldy—and increasingly popular, cheap, and available—tool.
But for many, the news is most surprising because it suggests a consumer DNA test is capable of revealing the truth about your lineage. It’s possible—even likely—to get varying results on ethnicity and heritage across different DNA testing kits. So how is this different?
First, a primer on how DNA testing works
Part of the confusion here stems from not understanding what geneticists are really looking at when they analyze your DNA. Most of the methods rely on something called SNPs (more on those in a second), but the difference between finding your ancestry and determining your direct relatives lies in what you do with those SNPs. Not every company tracks both immediate relatives and distant ethnic lineages, but places like Ancestry do—they track your potential distant cousins and serve up a summary of where your ancient ancestors might have come from.
“When you send us your sample, the first step is to extract the DNA from the cheek cells so we can genotype it. DNA contains about three billion letters, but we’re looking at 700,000 specific positions, which are ones that we know vary between humans,” says Julie Granka, a population geneticist at Ancestry.
Those specific positions are called SNPs, which stands for single nucleotide polymorphisms. When your DNA gets copied as your body makes new cells, the machinery often makes some mistakes. Most of the major errors get caught. If the code for a critical protein is messed up, that cell often just doesn’t survive, plus you have some spell-checking proteins that fix mistakes. But it’s easy to end up with errors in a single position.
Your DNA is made up of building blocks called nucleotides, which scientists refer to as A’s, T’s, C’s, and G’s (for adenosine, tyrosine, cytosine, and guanine) that pair up with each other. Wherever there’s an A on one DNA strand, the matching strand across the ladder should have a T, and likewise for C’s and G’s. Every once in a while, the protein copying the strand accidentally inserts the wrong nucleotide in an area of DNA where it doesn’t really matter. Maybe it doesn’t code for anything at all (most of your DNA doesn’t!), or maybe it’s a small enough error that it doesn’t change the functionality of the cell. That’s what an SNP is: a single change at a single position.
Some SNPs cause disease, but for genetic testing purposes we only look at benign bits.
Because these SNPs don’t have an impact on the cell’s functionality, they don’t get fixed. They get passed down through generations. So if one person in England a thousand years ago got an SNP where an A was swapped for a C at position 3455, many of that person’s descendents will have exactly the same SNP. More importantly, because you have billions of possible positions where one might occur, SNPs are unique. If you have an A-to-C SNP at position 3455, there’s a very high chance your ancestors are from England. If you don’t, that doesn’t mean that you’re not originally from England—it just means you don’t have that particular marker.
Since you’re likely to share a lot of SNPs with your close relatives, companies like Ancestry can use the same SNP data to figure out who might be your cousin (as long as that person is in their database). So to some degree, much of genetic testing is just based on what percentage of your DNA—as estimated by SNPs—you share with any other person. But it’s a little more complicated than that.
Ancestry and familial testing are fundamentally different, but also oddly similar
Companies like 23andMe and Ancestry have databases full of SNPs that they’ve traced back to certain parts of the world. When they run your sample, they’re comparing your set of SNPs to their database, then using the matches to determine which areas your ancestors are probably from.
But this process isn’t perfect. Granka explains that your genealogical information is based on a statistical model. It can tell you where you’re probably from, but that’s it. Ancestry, just like every other company in the field, has built up a database of reference populations. Your results are a direct reflection of which reference populations the company uses, though, which is why you’ll get differing answers.
This means that all of your ancestry data is mostly an estimate. That’s why companies attach a confidence interval to your results. They may say you’re 48 percent Eastern European, but that they’re anywhere from 30 to 80 percent sure of that result. Most people focus on the 48 percent and forget that the results aren’t for certain. But this cannot be stated enough: all ancestry results are based on a model, and that model can be wrong. Companies are using cutting-edge scientific methodologies to determine which reference populations came from which areas of the world, but it’s always going to be based on assumptions and estimates.
Genealogical testing, on the other hand, is more straightforward. “We’re looking for strands of yours that match to other people in the database,” Granka says. “Strands that are identical come from a common ancestor.” So, if person A and person B have around 12.5 percent of their DNA in common—it’s absolutely identical—we could say that A and B share a grandparent. If persons C and D share 50 percent of their DNA, those people are either a parent and a child or two full siblings.
See the difference? Ancestry testing is based on a potentially faulty model, whereas genealogy is about pure math.
It’s not about how much information you give to companies
It’s how much information we all give. The alleged Golden State Killer wasn’t caught because he handed over his DNA. A cousin uploaded his own DNA to a free, amateur site that allows users to find relatives by using a full genomic sequence—which he presumably got from some kind of paid service. Some companies, like 23andMe and Ancestry, allow customers to download their raw data, which they can then upload to other websites.
In part, it’s because of companies like GEDmatch—the site that the cousin used—that we’re so quickly encountering DNA-related privacy issues. Companies like Ancestry and 23andMe have historically refused requests from law enforcement, perhaps because they know consumers wouldn’t feel as comfortable handing over their DNA to a company that might give that data to the police. That’s not to say they’ll be able to stand up to a court order, though.
More importantly, it’s not even really about what you personally hand over. If enough people provide their DNA, companies or law enforcement could theoretically begin building profiles for everyone else, much like Facebook can have files on people who aren’t on Facebook. When you hand over your genetic data, you hand it over forever. No one can tell you whether that’s a good idea for you personally, but you should carefully consider the potential ramifications—especially if it’s all for an estimate.
If detectives and lawyers and, yes, crime lab technicians are already making mistakes with what little DNA evidence we have, think how many more they could make with entire databases. Even the Golden State Killer nearly got misidentified—an Oregon court forced a 73-year-old man to provide a DNA sample when detectives claimed they had evidence that he was the serial killer in 2017. Mistakes happen. DNA is not infallible, and we’d do well to remember that. Yes, giving your full genome to a free website could help you find long lost relatives. It could even help put away a dangerous criminal. But it could also get one of your cousins called in for a crime they didn’t commit.
Note: An earlier version of this article mistakenly called DNA base pairs “amino acids.” They are not, of course, and the author would like to thank Twitter for keeping her honest and accurate.