Our Web Videos Reveal More Than We Realize, and Perhaps More Than We Want

As we upload more and more videos to the Internet—one hour of new video every second to YouTube alone—experts are finding new ways to mine them. A team led by Igor Curcio of Nokia’s Research Center, for example, has developed an algorithm that stitches concertgoers’ cellphone footage into a single, synchronized multi-angle film. The concept is relatively simple: the audio track serves as a guide to sync up the footage, and the software chooses the best shots. Curcio has no real business model yet—photography is prohibited at most concerts—but giving people the ability to identify and coherently connect common elements in multiple videos is nonetheless a step toward something significant.

For instance, the drones that patrol the U.S.-Mexican border and the security cameras in cities already record more footage than human observers can possibly examine. If an agency could rely on a computer to track individuals, groups and events on its own, agents could use intelligence far more—well, intelligibly.

That new capability will drive the demand for even more raw data. The Intelligence Advanced Research Projects Activity (IARPA) agency, overseen by the U.S. director of national intelligence, has launched two projects that may help analysts use civilian video from YouTube, Vimeo and other sources. Investigators at the Finder program are studying ways to locate where and when a video was taken based solely on the image itself. That’s hard enough. But researchers at IARPA’s Aladdin are working on an even more challenging task: how to search for “specific events of interest.” If they succeed, analysts could feed in a name, a simple text description or a few sample videos of what they seek—say, “five people wearing backpacks next to a pickup truck”—and get back any number of clips that match the query.

Beyond categories lies the greater hurdle of finding not just an event or a group of objects but a single object: a missing child, a misplaced purse, a suicide bomber in a crowd. “For some classes of objects, like faces, people—to some extent vehicles—the capabilities are mature,” says Harpreet Sawhney, the technical director of vision and learning systems at SRI International, which conducts research for various U.S. government agencies. “But spotting them in an arbitrary video, shot from any of countless angles—that’s still a hard problem.”

IARPA’s systems could be the first step toward spotting a would-be bomber as he moves through the background of a wedding toast, a birthday dinner or a tailgate cookout. But when the government uses the videos we post online as an intelligence resource, it could finally destroy what little privacy remains in an already overly connected world. That is the choice we all face. By keeping footage private, we restrict its significance to only what our own eyes see. But by making it public, someone’s going to spot something we didn’t realize we were filming.