The Information Wrangler

He develops new ways to get the most information with the least effort
Photo by John B. Carnett; digital imaging by Neil Duerden

We may earn revenue from the products available on this page and participate in affiliate programs. Learn more ›

Carlos Guestrin wants to stop the spread of waterborne disease, design chairs that adjust to your posture, and cure Internet-induced information overload. This might seem a bit overambitious, but Guestrin has developed a single algorithm that can tackle them all.

Four years ago, Guestrin worked on an Intel project that involved placing tiny wireless sensors in a redwood forest. The researchers planned to use the sensors to learn about the forest’s microclimates. “I asked them how they were going to decide where to put the sensors,” he recalls. “They said they were more or less going to use intuition.”

Guestrin knew he could do better, and began to devise a way to optimize both the location and the number of sensors to gather the highest-quality data. The resulting algorithm churned through all possible locations for the first sensor, ranked each location in terms of its information-gathering potential, picked the best one, and then recalibrated all the other locations based on that choice. The second-best sensor was the one that added the most new information, and the algorithm continued re-ranking until the cost of installing one more exceeded the value of the data it would gather.

All Guestrin’s projects involve information flowing through a network, whether it’s temperature data in a forest, forces applied to a chair as a person sits down, or even news of a politician’s affair sprinting through the blogosphere. In each case, the algorithm’s goal is to get the best data through as few sensors, or with as little effort, as possible. A blog, for example, is a potential “sensor” of a story. Guestrin’s algorithm defined the best blogs as those that break big news stories early but don’t force users to read through too many posts to find them. As with the forest, it picked one, re-ranked, chose the next one, and so on, narrowing a list of 45,000 possible blogs down to the most efficient 100. (When Guestrin ran the program in 2006, Instapundit topped the list.

There are other sensor-placement algorithms, but Guestrin’s is uniquely fast and accurate. It won a simulation-based, EPA-sponsored contest that asked researchers to determine the best spots to place water-pollution sensors in a massive network of pipes. And Guestrin has proven theoretically that it is near-perfect. “This is the beautiful part,” he says. “No matter how smart you are at designing an algorithm, you can’t get much better than this.”