Can Technology Save the Military From a Data Deluge?

The U.S. military is drowning in data. This is how to save it from the flood

U.S. Dept. of Homeland Security

The U.S. military has a data problem. If knowing is half the battle then it’s the half the Pentagon should never lose, at least in theory. But in practice, the military’s data problem is significant, vexing, and given the current pace of acceleration, technologically intimidating. Just two years ago, there were roughly a dozen NATO aircraft flying surveillance missions over Afghanistan at any given time. Now, there are more than 50 Predator and Reaper drones in the air at once, and all of them are dumping fat streams of data to the ground all the time.

Meanwhile, more drones are joining the fights in Afghanistan and Pakistan, in the Horn of Africa and over Iraq and Yemen and elsewhere. Intelligence, surveillance, and reconnaissance (ISR) sensors are proliferating on conventional manned aircraft. The data streams are outpacing the DoD’s capacity to organize and store them and generating so much noise, so much unusable intel, that analysts can’t sort the relevant information from the useless. And all this is happening in an environment where anything slower than real time can take a toll in human lives.

“We’re swimming in sensors, and we need to be careful we don’t drown in the data,” says Dave Deptula, CEO and managing director for defense technology problem-solver MAV6. This isn’t the first time Deptula has said this, and it won’t be the last. In his previous post as the first deputy chief of staff for intelligence, surveillance and reconnaissance, Lt. Gen. Deptula was in charge of planning and implementing the entire U.S. Air Force’s ISR strategy, and he saw the data flood topping its levees firsthand.

“We’re swimming in sensors, and we need to be careful we don’t drown in the data.””The unavoidable truths are that data velocities are accelerating and the current way we handle data is really overwhelmed by this tsunami” Deptula says. “So we’re going to have to begin exploring different ways to meet the growing challenges of hyper-scale workloads.”

Simply growing the military’s rackspace–and that’s been much of the strategy for dealing with the problem to date–isn’t going to tame the flood. The DoD doesn’t just need new storage methods, but completely new concepts of operation that blend novel storage architectures, all kinds of digital semantics, and–critically–a healthy dose of artificial intelligence.

Someday soon, computer programs will view, tag, organize, and store hundreds and thousands of video streams simultaneously, deciding what sensor data is relevant to the fight at hand, what needs immediate attention, and what needs to be filed away. Language interfaces will let analysts instantly search their databases with natural language queries like something straight out of Star Trek. Drones themselves will even become computerized intelligence analysts, combing through their own data streams in realtime to highlight only the choicest bits of intel. These technologies are already in the works, and this is how technology will save the military from its technology.


The idea of speaking to our computers in natural language is pervasive in science fiction, and with the advent of the iPhone 4S and its Siri software assistant the reality has caught up to the ideal. In fact, if you look at Siri’s lineage, this kind of natural language interface was always meant for the warfighter. And soon it will allow intelligence analysts to speak to their machines–and perhaps more importantly, other machines in the intelligence community–with confidence that their computers understand.

“Say for example I’m the intelligence analyst working in a very well-defined battle space,” says former Marine Corps intelligence officer Tony Barrett. “I’m in the Marine Corps and next door to me is an Army unit using completely different systems to catalog and store their data. But they’ve got information relevant to my fight, and I’ve got information relevant to their fight. The question becomes: how do we make that data relevant so that it becomes apparent to me that they’ve got data that’s important to me?”

Barrett is now the senior business development manager at Florida-based software developer Modus Operandi, a company that’s helping the DoD efficiently catalog its data overflow. But Barrett, like Deptula, knows the answer to his question goes beyond storage solutions. Modus Operandi and companies like it are scrambling to create natural language processing and textual analytics that allow machines and people to share a common language.

If they can teach machines to talk like humans and in turn understand natural human speech, they can attach whole new levels of meaning to information stored in databases across military services and the intelligence community. And calling up relevant data on such a system would be as easy as calling up the closest Starbucks via Siri. Analysts would simply ask for it.

“The software makes a cognitive leap from your statement–it’s translated it and figured out your vocabulary and the meaning behind it–and then returned you a relevant result,” Barrett says. “If you type in a search in plain language–‘Tell me where Abu Bakar was last reported to be seen’–the language and ontologies and grammar of that request are translated and provided meaning. The same thing happens on the other end and we return it back to the user in a usable, relevant form.”

In other words, analysts are using their computers to have a conversation, rather than to type in keywords associated with metadata tags. With a common natural language, associations between data become more apparent, search becomes faster, walls between different systems and databases dissolve, and the entire enterprise becomes more efficient.


Teaching computers and humans to interface better with one another enhances the storage and retrieval of data, but it doesn’t address the key problem: the amount of raw data that needs to be analyzed in the first place. Much of that data is video from drones or other aircraft, and analysts spend hours watching footage, cataloging video data, and hoping for a relevant bit of intel. If the military wants to cut down on wasted man hours and bring intelligence analysis closer to real time, it will have to teach computers how to see.

That’s easier said than done. What humans do instantly–see, identify, and evaluate an object–is extremely challenging for software. The human brain can instantaneously associate incredible amounts of prior knowledge with an object. Computers, on the other hand, see pixels, or a matrix of varying intensity values. And while object recognition algorithms are getting better at a rapid pace, computer vision is still pretty rudimentary next to the human eye.

But while all this video is a major part of the data problem it’s also helpful from a computer vision standpoint, says Dr. Anthony Hoogs, director of computer vision at software developer Kitware. Kitware is currently leading Phase II of DARPA’s Video Image Retrieval and Analysis Tool (VIRAT) program and is developing the very kinds of video analysis tools that could help solve the Pentagon’s data problem.

“Video helps quite a bit,” Hoogs told PopSci in an interview earlier this year. “In video we have an important cue, which is motion. It turns out that motion is relatively easy to detect, and you don’t have to know what the object is. If something is moving, the intensity values at that location where the object is or where it was will change, and that’s relatively easy to pick up.”

Hoogs can’t speak directly about VIRAT or the current state of DARPA’s pursuit of computer vision, but a look at the VIRAT program itself points the way forward. VIRAT is focused on video footage from UAVs like Reaper and Predator drones, and it is anchored on the ability to recognize activities, like people leaving or entering buildings or vehicles moving from one place to another. In other words, VIRAT aims to quickly and automatically comb through video looking for specific motions, and when it sees them it will tag them, saving analysts countless hours.

Of course, computer vision capabilities are accelerating alongside the volume of video data–“we’re seeing an exponential exponential increase in the number of vision applications,” Hoogs tells us–and the better the machines get at parsing digital data, the more effective they will be at cataloging and attaching relevance to the cascading digital video pouring into the intelligence community. Soon drones will be recognizing and tracking specific faces and vehicles, alerting analysts when particular subjects are on the move or are spotted in crowds. But all that data still has to go somewhere.


“Video processing was one of those things that was really enabled by computing power,” Hoogs says. “Video [computer vision] didn’t really take off until the mid-’90s because digital video was rare. Having digital video and the ability to process it somewhat efficiently–this was enabled by bigger computers and bigger computer disks.”

But bigger computers and bigger storage architectures are a problem when it comes to UAVs like the Reaper or Predator that have limited payload capacities. Now the military has more digital video than it can handle and as sensors get lighter and more robust, drones are streaming back more data at even higher resolutions to stations on the ground. Data downlinks are bottlenecking between the point of data collection and the end users on the ground.

An aircraft that can carry a supercomputer can process its own data in real timeOne solution, as Deptula sees it, is to keep the 95 percent of the data that isn’t useful in the sky and only downlink the most relevant information. That means performing data processing on the fly. An aircraft that can carry a supercomputer can process its own data in real time, tagging it, organizing it, and storing it as it goes. Multiple analysts could then access the on-board database from the ground simultaneously, offloading only the pre-processed data they need rather than waiting for terabytes of raw data to downlink and be processed on the ground.

That’s exactly the capability Deptula and MAV6 aim to provide the Air Force. MAV6’s Blue Devil airship is being developed under a $86.2 million USAF contract as a potential solution to the data glut currently bogging down intel analysts in Afghanistan. Though not a speed demon, Blue Devil will be capable of long-duration ISR missions (manned or unmanned), pack a robust range of sensors, and be able to interface with other ISR capabilities like Reaper drones.

But the key innovation is its on-board supercomputer–packing the equivalent of 2,000 single-core servers and 500 terabytes of non-volatile memory–that will be able to process and catalog intel as it is collected. The ability to process that data on the fly and extract and downlink only the most pertinent intel will drastically improve the speed at which useful data reaches decision-makers on the ground and vastly reduce the noise that contributes to the overall data glut.


Obviously, the supercomputer-on-an-airship model doesn’t scale to the individual Predator drone, which doesn’t have the space or capacity to carry its own onboard processing suite. But as computer language and computer vision become more robust, storage architectures shrink, and ISR platforms change to meet the demands of the future–and all of these things are already happening–it’s not difficult to envision a day in the near future when the military finally climbs back on top of its data problem. The issue then is finding the right balance between machine reliability and human decision-making so that the armed services and intelligence community can get the most out of both.

“I always talk about keeping the human in the loop,” Modus Operandi’s Barrett says. “There are always judgment calls that have to be made. But if I look across popular culture and I look at the continuing development of artificial intelligence–I watch a computer win on Jeopardy–I start to think of tremendous possibilities that exist.”

The artificial intelligence piece is integral to making the future of ISR a reality. Future drones will have the ability to gather more sensor data than ever before, process that data in near real-time, and make determinations about what information is relevant to the fight at hand so that it can be immediately downlinked and brought to human attention. The rest will be tagged with metadata and carefully filed away so human analysts can call it up with simple language queries later.

In other words, we’ll be relying heavily on machines to do a good deal of the leg-work, as well as to make some low-level judgment calls. In situations where lives hang in the balance, such reliance on technology may be troubling, but it’s more or less the only way forward in a data deluged battle space.

“Between the birth of the world and 2003, there were five exabytes of information that were created,” Deptula says. “We now create five exabytes every two days, and that’s accelerating. So this large data problem is significant, and we’re not going to solve it by continuing to do data management the way we have been up to this point in time.”

From that perspective, not only can technology save the U.S. military from its technology, but it’s probably the only hope.