What is EPA Open Data, and why would it shut down?

We don’t know what we’ve got till it’s (almost) gone

Looking through hole
Closed data is like trying to see through a peephole.Pexels

On Monday, millions of Americans woke up to a startling surprise: data that was set free would once again be locked away. According to a pop-up that appeared on the EPA Open Data website, its days are numbered—the message stated that the site would cease to exist as of April 28, 2017. It turns out this isn't quite true, but the site isn't exactly safe, either.

pop up
The pop-up that greets you stating the EPA Open Data Website will shutter come April 28.PopSci / EPA

Writes Bernadette Hyland the scientist who first broke the news, "The US EPA Open Data Service provides human and machine readable information on over 4M US EPA regulated facilities, from dry cleaners to nuclear power plants."

Hyland is the CEO of a company that helps maintain EPA Open Data. She was reportedly informed by the EPA that they “need to be ready to turn-off the EPA Open Data web service by noon on April 28, 2017—the last day of the continuing resolution. If Congress does not pass a budget, we will be facing a government shutdown and won’t be able to give technical direction to continue any work.”

In other words, the EPA Open Data site is one of many government entities—including the ones that control public parks and your tax refund—that would go dark if the government were to shut down.

On Twitter, many people found threats of the Open Data shut down concerning because of the increased government criticism that the EPA has received since January.

The EPA then tweeted out that the rumors are false, and that the data website isn't going anywhere.

Eventually the pop-up was replaced with this correction:

pop up
EPA

Why all the fuss? Why would anyone care if the EPA Open Data website went dark?

In 2013, the White House Office of Science and Technology Policy (OSTP) (which, as its name suggests, rests within the executive branch) told the federal agencies that do research—the EPA, the CDC, the National Institutes of Health, etc.—that the agencies had to find some way to ensure that their research, paid for with American tax dollars, was made public.

The concern was two-fold. The first was that American-funded research would be locked behind journal paywalls that, in recent years, have become pricey. Publication in a peer reviewed journal, where a panel of experts in your field review a study for scientific rigor and determine it worth of inclusion in the canon of scientific literature, remains the gold standard for scientific research. However, once a study is published in a journal, accessing that information can become difficult, since many of the top publications are private entities that charge for access.

Accessing a single journal article usually costs around $30 dollars—though institutions like universities and libraries usually buy a subscription that give them access to a suite of journals and their historical archives. The way those subscriptions are bundled is not cheap, however. In fact, by 2012, even Harvard University, which at the time had an endowment of over 30 billion dollars, said that journal subscriptions had become too expensive. Harvard began encouraging its academics to publish in open access journals instead. Open access journals, the most famous of which is probably PLOS, don't charge people to access articles—though there's still some debate over whether or not the quality of these journals is generally as good.

Paying to access government data wasn’t just expensive: it also meant that people were essentially paying twice to access the research. They paid first through their tax dollars, and secondly through a fee paid to a journal by themselves or their institution. In the case of public libraries and state universities, this meant that the government was essentially paying a fee to provide access to its own research.

The EPA Open Data website, which is the largest and most robust in terms of the data it provides, was one way of getting around that. The agency’s scientists would continue to publish in peer reviewed journals, but they'd also post their findings on the EPA website. Paying to access EPA data wouldn’t be compulsory.

But the EPA Open Data website took the idea and ran with it—perhaps, in part, because the structure of environmental regulations depends somewhat on the public's awareness that there is a problem (you can see their full plan here). If you go to the EPA Open Data Facilities by Zip Code website you can see what EPA facilities of interest are in your area: how environmentally friendly is that dry cleaner down the road, how much does your local electric plant pollute the neighborhood, and so on.

A lot of this information is available elsewhere, but the Open Data website provides it in a format that is incredibly easy for humans and machines to access. Want to build an app with Open Source data? You can do that. Want to map the most polluting industries in your neighborhood? You can do that, too.

The shutdown of the EPA Open Data website would threaten that access. While this morning's pop-up seems to have been a false alarm, we could lose access to the site—at least temporarily—in the event of a government shutdown. At a time when members of the house are criticizing the EPA for being too private, it would be unfortunate, to say the least, for Americans to lose such a great tool for transparency.