The EPA is jeopardizing scientific research and privacy in the name of ‘transparency’

“The science that we use is going to be transparent, it’s going to be reproducible,” Scott Pruitt, Administrator of the Environmental Protection Agency, told the audience assembled at the agency on Tuesday.

He was describing a newly proposed rule for the EPA, which would limit the kinds of research the agency could take into consideration when making policy decisions by requiring all data be “publicly available.” Those two keywords—transparent and reproducible—came up again and again in the following days, a kind of verbal talisman for Pruitt and supporters of the proposed rule.

On their face, transparency and reproducibility are hard to argue against; they are fundamental principles of the scientific method. But in practice, calls for greater transparency have almost always been part of industry efforts to cast doubt on unfavorable research. In particular, the mandate for public data access that Pruitt proposes would disbar most if not all public health studies that rely on participants sharing sensitive health data, which is protected under confidentiality agreements. These massive, long-running epidemiological studies form the basis of the science of public health—without them, policy makers will have little to no ability to link environmental exposure to human health risks.

“It’s a tactic that has to do with impugning science by saying there’s a lack of transparency or certainty,” says Joel Kaufman, a public health researcher at the University of Washington who studies the link between air pollution and heart disease. Big Tobacco invoked the same arguments for transparency and access to discredit cancer findings, and oil companies casting uncertainty on the science of climate change do the same.

“These are phony issues that weaponize ‘transparency’ to facilitate political interference in science-based decision making, rather than genuinely address either,” wrote the Union of Concerned Scientists, a nonprofit science advocacy group, in an open letter to Pruitt. “The result will be policies and practices that will ignore significant risks to the health of every American.”

Six Cities

“This goes back 40 years, since we started that study,” says Douglas Dockery, an epidemiologist at Harvard’s School of Public Health. Starting in the late ’70s, Dockery led a team of researchers who collected health data from residents in six U.S. cities. For 16 years, the scientists tracked more than 8,000 individuals, recording their lifestyle and environmental risks and collecting death certificates when participants died.

In 1993, they published the study in the New England Journal of Medicine, showing a 26 percent increase in mortality rate in the most-polluted city compared to the least-polluted city. “Mortality was most strongly associated with air pollution with fine particulates,” the authors wrote, in one of the first definite findings linking fine particulate matter—PM2.5, which includes things like soot from coal-burning power plants—to serious health risks for humans.

The study was peer reviewed, but its raw data (including birthdates, social security numbers, and other sensitive information from participants) was confidential. “We had both ethical and legal restrictions,” Dockery said. The researchers had signed agreements with each participant ensuring their privacy, as well as various confidentiality agreements with the states that released death certificates to them.

When the EPA reviewed its air standards, after the publication of the Six Cities study, Dockery was called to testify about his work. “I went up to Capitol Hill at that time,” he remembers, 25 years later, “and they had people dressed up in lab coats saying, ‘give us the data.’”

Privacy vs. Transparency

“The best studies follow individuals over time, so that you can control all the factors except for the ones you’re measuring,” former EPA administrator Gina McCarthy told the Washington Post. “But it means following people’s personal history, their medical history. And nobody would want somebody to expose all of their private information.”

Despite the fact that we live in a world of ever-dwindling privacy, health data is a rare exception. When epidemiologists collect data about mortality rates, they also collect information about individuals’ lifestyles, how much they exercise, whether they drink or smoke, what their diets are like, whether they suffer from other diseases or occupational hazards that might affect their health. The result is an intimate portrait of an individual that might contain details not known to their closest friends and family.

As Joel Kaufman explained, “the devil’s in the details about what they decide is ‘data’.” While there are ways to redact and anonymize personal health data, there are limits to how much an individual’s identity can be protected, especially in small communities where basic identifiers like age and gender may be enough to recognize a neighbor or a family member. And advocates for “transparency” can always say that is not good enough, and demand the original forms and questionnaires participants filled out, Kaufman says.

Pruitt has said that the proposed rule is necessary to give the public more information to consider during public comment periods on proposed policies, as though private citizens are clamoring to comb through multi-decadal data sets to double-check the work of scientists. But there are ways to verify data without making it accessible to the public at large: for example, by scientific peer review, independent verification, or confirmation by other studies. In other words, scientists from disparate institutions and research groups should (and do) check one another’s work.

The Six Cities study was ultimately confirmed by all three of these methods. An independent organization called the Health Effects Institute got access to the original data (subject to the same confidentiality agreements as the original researchers) and attempted to reproduce the findings—which they did. And in the time since the original study came out, many other researchers in many other places have come up with the same connection between fine particulate air pollution and cardiopulmonary disease and death.

“These were important studies at the time, 25 years ago,” Dockery says now of his work on the Six Cities project, as well as the accompanying American Cancer Study. “But they’ve been replicated and validated and shown to be true scores of times by many other studies, and much better studies. To go back and suggest that somehow you could remove the study from the record is kind of irrelevant now—there’s massive evidence that these studies have shown to be true.”

Sowing uncertainty

This week, Trump transition team member—and advisor to the conservative policy think tank Heartland Institute—Steven Milloy told The Atlantic that the Six Cities Study is “the biggest science fraud that has gone on in this country’s history” before making the claim that “China, for the last few years, has had these huge episodes of PM2.5. No one’s died.” (Millions of people have died because of China’s polluted air.)

Milloy’s comments may be a particularly blatant example, but they’re part of a long-running attempt to cast doubt on the integrity of science that turns up results unfavorable to industry.

In 1999, Richard Shelby, a Republican representative from Alabama, introduced a two-line amendment to an omnibus spending bill, stipulating that any federally funded science be subject to Freedom of Information Act requests. The amendment was eventually interpreted to include protections for individual privacy and intellectual property, but at the time science advocated viewed it as an asymmetrical attack that left industry-funded science free from scrutiny.

In recent years, Texas representative Lamar Smith championed the argument that data should be publicly available, through the introduction of the Secret Science Reform Act in 2015 (which became 2017’s HONEST Act). Neither bill was ever able to pass both chambers of the House, and Smith has announced he will not seek reelection in 2018. According to emails acquired by the Union of Concerned Scientists, Smith met with Pruitt in January and made a “pitch” that the agency implement the basic principle of his HONEST Act in the EPA.

“The basis of science is that if you do something, and come to a conclusion, I ought to be able to look at how it was done and be able to come up with the same conclusion,” said Idaho representative Mike Simpson at Pruitt’s appearance before the House Appropriations Committee on Thursday. “Right now, nobody knows how the EPA comes up with a lot of the information that they come up with.”

But reproducibility—the second buzzword linked to the proposed rule—isn’t really a problem in epidemiology, Kaufman says. Any discussion of a “replication crisis” in public health science is likely linked to confusion sown by industry scientists who perform their own studies and analyses of data. Kaufman fears this is one possible outcome of the transparency rule: If industry gets a hold of original datasets, they may be able to slice the statistics in a way that looks more favorable to them.

“If you analyze data over and over again, you can often come up with different answers,” says Kaufman. “Torture the data until it confesses, as we sometimes say.”

He points out that this has been done with the science of climate change. “If you were to say, OK, we’re only going to look at the last six months” of global temperature data, he says, “you might be able to come up with an answer that says the earth’s not getting warmer.” But that would be only part of the picture—a scientifically valid analysis would require you to study the complete range of the data, using tools and approaches in line with the original purposes of data collection. If you start fudging those principles, Kaufman says, “You can often arrange the data to get an answer that you want to find.”

Risky business

“This was clearly an attempt to limit the science that’s available to policy making, and especially an attack on epidemiology,” says Dockery. In the United States, unlike Europe and other marketplaces, industry doesn’t bear the burden of proving a chemical or pollutant is safe for humans—substances can only be proven to be harmful after the public has been exposed to them. That means massive, real-time studies like Six Cities are fundamental for understanding the risks of any given regulation or policy.

“Basically, we’re using the American public as guinea pigs for testing these new products that are out there,” Dockery says. “It’s not that you’re hurting the scientist,” he explained of the proposed rule. “This is not preventing research. What it’s doing is preventing science from contributing to discussions about policy.”

If those studies become inadmissible when setting policy at the EPA, the health consequences could be grave. “We’ve made enormous strides in the U.S. in improving air quality, and that has spelled an enormous improvement in health,” says Joel Kaufman. But those improvements have been driven by the scientific evidence that there are health consequences. “If you don’t permit to enter into evidence the literature that show there are risks of air pollution, you risk turning back the clock.”

“The purpose of this rule is not to increase transparency,” Kaufman said. “It’s to make it harder to regulate.”

One thing everyone can agree on can be summed up in a statement from Joseph Bast, director of the Heartland Institute: This proposed rule “may be the most consequential decision made by EPA since the election of Donald Trump.”