They came, they saw, they took pictures. And thanks to them — about 150,000 Flickr users — a team of computer scientists built Rome in a day.
Using nearly half a million Flickr photos of Rome, Venice, and the Croatian coastal city of Dubrovnik, a team of computer scientists at the University of Washington’s Graphics and Imaging Laboratory assembled digital models of the three cities in 3-D.
Their work builds on the algorithms used in Microsoft’s Photosynth, which were invented at the same lab, but it’s like Photosynth on steroids.
“The key difference is that Photosynth was aimed at doing a single monument or landmark, which meant that it was scaled to a couple hundred or a thousand photographs, after which it became too slow,” said Sameer Agarwal, an assistant professor at UW who worked on the project. “We can now process truly huge data sets — the big breakthrough here was being able to match the images fast.”
A series of videos on the project Web site lets visitors fly through landmarks like St. Peter’s Basilica, the Colosseum and Venice’s San Marco Square. For much smaller Dubrovnik, you can see the whole city, including mountains in the distance.
Each video includes clusters of small diamond shapes, which represent each photographer and his or her vantage point.
The team built a new algorithm that proceeds in two steps — first, by matching the photos by what they had in common, puzzle-style, and then by determining the scene and each photographer’s pose. They also designed new software that can more quickly solve the type of large math problems that exist in 3-D reconstruction.
It took 500 computer processors 13 hours to match 150,000 photos for Rome’s landmarks, and eight more hours to construct a 3-D image of them. Venice involved 250,000 images, which took 27 hours to match and 38 hours to reconstruct. By contrast, using the algorithms on which Photosynth is based, it would have taken 500 processors at least a year to match 250,000 photos.
Dubrovnik had fewer photos, so matching only took about five hours, but the reconstruction ate up almost 18 hours.
It stands to reason that more photos would take more time, but there were so many similarities among Rome’s photos that it was simpler to put them all together into individual landmarks. The team found clumps of photos that went together, yielding fine detail of the front of the Trevi Fountain, for instance. The Colosseum had 2,000 images. For Dubrovnik, however, the team had just 4,600 photos corresponding to the entire “old city” portion, which comprises several narrow streets and tall buildings.
“For Rome, since most of what we got were landmarks, the geometry is quite simple. Even through the building geometry is quite complicated, the overall is quite simple,” Agarwal said. “For Dubrovnik, it’s not just a matter of having twice as many images — the (3-D) geometry is more complex.”
Steve Seitz, another member of the team, said the next goal is to stitch together a million photos, ultimately creating a photo-realistic 3-D tour of an entire city.
“This is one of the main intellectual challenges here. We want to see how much of the city can be reconstructed from people’s tourist photos,” he said.
Agarwal said the technology could be used for everything from video games, to next-generation GPS, to preservation for the sake of posterity.
Venice is slowly sinking into the lagoon that surrounds it, for instance, and a 3-D tour could digitally preserve the city for future generations. Earthquake-prone cities could be catalogued, both for history and for municipal planning efforts.
“If you have a digital representation of something, then you can study it. Maps only offer you a limited view,” Agarwal said. “There are a number of very different kinds of uses for something like this. And there’s just the pure science aspect of it, which is advancing how you can do large scale 3-D construction.”