Can The NSA's Machines Recognize A Terrorist?

Machine learning algorithms used by the U.S. National Security Agency to identify potential terrorists in Pakistan may be ineffective, because we just don’t have enough data to tell the signs of a terrorist, claims an investigation by Ars Technica UK.

The NSA project, disastrously named Skynet, uses cellular network traffic in Pakistan to identify and monitor potential threats, according to leaked documents on The Intercept. Like many machine learning algorithms in big data, it takes millions of values as input, and tries to match certain patterns. This was revealed by the Intercept in 2015, but the Ars investigation dives into how ineffective the program could really be.

This is much like the machine learning used by tech companies today to govern most of what we see online. Facebook uses machine learning to rank your news feed, and Google has started to use it in search.

But these techniques only work reliably if the machine is initially trained with many, many examples of what the correct pattern looks like. In this case, that correct pattern could include locations, behavior like excessively swapping cell phone hardware, and only receiving calls, not placing them. Patrick Ball, director of research at the Human Rights Data Analysis Group, told Ars Technica that the data used is too vague for any reliable outcome.

“First, there are very few ‘known terrorists’ to use to train and test the model,” Ball said. “If they are using the same records to train the model as they are using to test the model, their assessment of the fit is completely bullsh*t.”

The Skynet project uses data from just seven known terrorists.

Ball says that to test their model, the Skynet project uses data from just seven known terrorists, plus a random sampling of 100,000 mobile phone users. To test their algorithm, the NSA shows it six of seven known terrorist patterns, then all of the normal patterns, and then tasks the algorithm with finding the seventh terrorist pattern hidden somewhere in the noise. These calculations are made on 80 variables about each cell phone user, and the NSA has records on 55 million users, according to the NSA presentation. This is contrasted to more than 180 million citizens of Pakistan, making the data incomplete at best.

“Incomplete at best” is also a great way to describe the outputs. The NSA can get a .18 percent rate of false alarms, if they miss half of all potential matches. One slide literally says, “statistical algorithms are able to find the couriers at very low false alarm rates, if we’re allowed to miss half of them.” With 55 million records searched, about 99,000 hits would be false positives.

But all this information rests on slides that might be from 2011 or 2012. We also have no idea about how these might be have been refined, or thrown out, or are being used today as they potentially were in 2011 with little oversight. The slides might be false. (That’s probably not the case, but it’s possible.) The NSA could actually have far more than 55 million records now.

And it should also be noted that we have no idea what the NSA is actually doing with this data. It could be funnelled into reports to inform drone strikes, although it would seem the government isn’t treating every positive match as a threat, despite the alarming 3,994 people killed by U.S. drone strike in Pakistan since 2004.

Giving algorithms this much power isn’t a big deal if it’s tagging Facebook photos, or determining who to show an advertisement to, but such a wide margin of error is deadly when lives are on the line.

“It’s bad science, that’s for damn sure,” Ball said.

The Skynet project uses data from just seven known terrorists.

Win the Holidays with PopSci's Gift Guides

NASA is finishing its first off-world accident report NASA is finishing its first off-world accident report

Fire likely killed a group of Stone Age humans uncovered in Ukraine Fire likely killed a group of Stone Age humans uncovered in Ukraine

Fooling the machine Fooling the machine

No machine can beat a dog’s bomb-detecting sniffer No machine can beat a dog’s bomb-detecting sniffer

Army Developing Drones That Can Recognize Your Face From a Distance Army Developing Drones That Can Recognize Your Face From a Distance

Can Technology Save the Military From a Data Deluge? Can Technology Save the Military From a Data Deluge?

Autonomous war machines could make costly mistakes on future battlefields Autonomous war machines could make costly mistakes on future battlefields

The Terminator Scenario: Are We Giving Our Military Machines Too Much Power? The Terminator Scenario: Are We Giving Our Military Machines Too Much Power?

This drone knows precisely where to drop a life-saving raft This drone knows precisely where to drop a life-saving raft

Arctic Report: Inside An Icebreaker Ship Arctic Report: Inside An Icebreaker Ship

Edward Snowden: The Internet Is Broken Edward Snowden: The Internet Is Broken

Moxie Marlinspike Makes Encryption for Everyone Moxie Marlinspike Makes Encryption for Everyone

How Artificial Intelligence Will Translate Facebook Photos For The Blind How Artificial Intelligence Will Translate Facebook Photos For The Blind

Can We Make A Computer Make Art? Can We Make A Computer Make Art?

An Ancient Board Game Sparks New Rivalry Between Google and Facebook An Ancient Board Game Sparks New Rivalry Between Google and Facebook

Terror on Twitter Terror on Twitter

To Catch A Bomb-Maker To Catch A Bomb-Maker

The ‘Michael Jordan’ Of Machine Learning Wants To Put Smarter A.I. In Your Home The ‘Michael Jordan’ Of Machine Learning Wants To Put Smarter A.I. In Your Home

Share

The Skynet project uses data from just seven known terrorists.

Win the Holidays with PopSci's Gift Guides