How to Teach a Robot to Improvise

Drones are learning the difference between a car and a tree--and how to make their next moves

Self-piloted drones have become sophisticated enough to land on moving aircraft carriers, but put a single unexpected tree in the way, and they will crash. Now a five-university group that includes specialists in biology, computer vision and robotics is trying to teach drones to dodge obstacles on the fly. Working with $7.5 million from the Office of Naval Research, the scientists aim to build an autonomous, fixed-wing surveillance drone that can navigate through an unfamiliar city or forest at 35 miles an hour.

The group’s inspiration is the pigeon. Hardy, plentiful and receptive to training, the birds are easy to study. In flight, they estimate the distance between themselves and objects ahead by quickly processing blurry, low-resolution images, just as a drone will need to do. And, crucially, they have a tendency to make decisions at the last moment—within five feet of an obstacle.

The first step is to teach robots to differentiate between obstacles and empty space. Engineers have already figured out how to train point-and-shoot cameras to spot faces in a photo: In a process called supervised learning, a technician feeds millions of images into a computer and tells it to output a “1” when the image contains a human face and a “0” when it does not. But this style of supervised learning would be an impossibly labor-intensive way to train a drone. A human would have to label not just faces but every possible object the robot might encounter. Instead, Yann LeCun, a professor of computer and neural science at New York University who leads the drone’s vision team, is developing software that will allow the drone to draw conclusions about what it’s seeing with much less human coaching. By mimicking the hyperefficient parallel processing method that the brain’s visual cortex uses to classify objects, the software enables features from the raw video frame to be extracted much more quickly. As a result, the drone’s human instructors need to show it only a few hundred to a few thousand examples of each category of object (“car,” “tree,” “grass”) before it can begin to classify those objects on its own.

Step one is to teach robots to differentiate between obstacles and empty space.Once the researchers have taught the drone to see, they will need to teach it to make decisions. That involves grappling with the inherent ambiguity of visual data—with deciding whether that pattern of pixels ahead is a tree branch or a shadow. Drew Bagnell and Martial Hebert, roboticists at Carnegie Mellon University, are developing algorithms that will help the robot deal with visual ambiguity the way humans do: by making educated guesses. “They can say, ‘I’m 99 percent sure there’s a tree between 12 meters and 13 meters away,’ and make a decision anyway,” Bagnell says.

It will take a lot of computing power to make those decisions. The drone will have to process 30 images per second while contemplating its next move. LeCun says that a processor that can run his algorithms at a trillion operations per second would do the job, but the challenge is to build all that power into a computer light and efficient enough to fly. The best candidate is a processor that LeCun developed with Eugenio Culurciello of Purdue University: a low-power computer the size of a DVD case called NeuFlow, which LeCun is confident he’ll be able to speed up to a trillion operations per second by the group’s 2015 deadline.

Once they’ve built a robot that can learn, see and make decisions fast enough to avoid obstacles, they still have to teach it to fly. Russ Tedrake, an MIT roboticist, is already using motion-capture cameras and a full-scale prototype of the final drone to model the maneuvers it will need to perform. If the team succeeds, the result will be a robot that can descend into a forest and lose today’s drones in the trees.


As the drone flies, its onboard camera will feed video to software that applies a series of filters to each frame. The first filters pick up patterns among small groups of pixels that indicate simple features, like edges. Next, another series of filters looks for larger patterns, building upward from individual pixels to objects to complex visual scenes. Within hundredths of a second, the software builds a low-resolution map of the scene ahead. Finally, it will compare the objects in view to ones it has “seen” before, classifying them as soon as it has enough information to make an educated guess.

Andrew Rosenblum wrote in the April issue about trucks that fight jet-fuel fires. He lives in Oakland, California.