
Using nearly half a million Flickr photos of Rome, Venice, and the Croatian coastal city of Dubrovnik, a team of computer scientists at the University of Washington's Graphics and Imaging Laboratory assembled digital models of the three cities in 3-D.
Their work builds on the algorithms used in Microsoft's Photosynth, which were invented at the same lab, but it's like Photosynth on steroids.
"The key difference is that Photosynth was aimed at doing a single monument or landmark, which meant that it was scaled to a couple hundred or a thousand photographs, after which it became too slow," said Sameer Agarwal, an assistant professor at UW who worked on the project. "We can now process truly huge data sets -- the big breakthrough here was being able to match the images fast."
A series of videos on the project Web site lets visitors fly through landmarks like St. Peter's Basilica, the Colosseum and Venice's San Marco Square. For much smaller Dubrovnik, you can see the whole city, including mountains in the distance.
Each video includes clusters of small diamond shapes, which represent each photographer and his or her vantage point.
The team built a new algorithm that proceeds in two steps -- first, by matching the photos by what they had in common, puzzle-style, and then by determining the scene and each photographer's pose. They also designed new software that can more quickly solve the type of large math problems that exist in 3-D reconstruction.
It took 500 computer processors 13 hours to match 150,000 photos for Rome's landmarks, and eight more hours to construct a 3-D image of them. Venice involved 250,000 images, which took 27 hours to match and 38 hours to reconstruct. By contrast, using the algorithms on which Photosynth is based, it would have taken 500 processors at least a year to match 250,000 photos.
Dubrovnik had fewer photos, so matching only took about five hours, but the reconstruction ate up almost 18 hours.
It stands to reason that more photos would take more time, but there were so many similarities among Rome's photos that it was simpler to put them all together into individual landmarks. The team found clumps of photos that went together, yielding fine detail of the front of the Trevi Fountain, for instance. The Colosseum had 2,000 images. For Dubrovnik, however, the team had just 4,600 photos corresponding to the entire "old city" portion, which comprises several narrow streets and tall buildings.
"For Rome, since most of what we got were landmarks, the geometry is quite simple. Even through the building geometry is quite complicated, the overall is quite simple," Agarwal said. "For Dubrovnik, it's not just a matter of having twice as many images -- the (3-D) geometry is more complex."Steve Seitz, another member of the team, said the next goal is to stitch together a million photos, ultimately creating a photo-realistic 3-D tour of an entire city.
"This is one of the main intellectual challenges here. We want to see how much of the city can be reconstructed from people's tourist photos," he said.
Agarwal said the technology could be used for everything from video games, to next-generation GPS, to preservation for the sake of posterity.
Venice is slowly sinking into the lagoon that surrounds it, for instance, and a 3-D tour could digitally preserve the city for future generations. Earthquake-prone cities could be catalogued, both for history and for municipal planning efforts.
"If you have a digital representation of something, then you can study it. Maps only offer you a limited view," Agarwal said. "There are a number of very different kinds of uses for something like this. And there's just the pure science aspect of it, which is advancing how you can do large scale 3-D construction."
Stay up to date on the latest news of the future of science and technology from your iPhone with full articles, images and offline viewing
Featuring every article from the magazine and website, plus links from around the Web. Also see our PopSci DIY feed
Share links with friends, comment on stories and more
In our December issue, Popular Science names the 100 best innovations of the year: bombproof wallpaper, self-parking cars, the fastest helicopter, and 97 more. Plus inventor profiles and videos.
Check out the best of what's new here.
This is indeed awesome. Really, genuninely fabulous stuff.
A pedantic quibble, perhaps, but the headline of your piece kind of gets it wrong. It's not "150,000 Flickr photos" - as the article says, it's the photos sourced from more than 150,000 Flickr users. That's different.
Still - awesomely splendid work and kudos to Prof Agarwal and team.
Wow, that is truly amazing.
Tiff
www.real-privacy.net.tc
Dude, lotsa time gone to a seemingly good use of the internet, flikr, and time left over.
All in all, time well wasted.
-DaSonicMan
Does that mean you could theoretically use the photos taken in google maps (the streetview) to create a 3D rendering of the entire U.S.?
Not to be glib or anything, but seems like a whole lot of work for something that can be done much easier? How about a video camera tour of the city?
To: acrefeld
You did intend to be glib, that's another topic though.
The reason this is awesome, is that your video doesn't actually make anything 3D. It is simply a series of 2D photos.
This is true rendering, taking the outside 2D pictures of a building and creating a 3D model of it is an astounding achievement of mathematics and computer science.
I don't think the breakthrough here is that they took 2D pics and converted them to a 3D model. That has been done before. The cool thing is they took them from Flicker content and generated for free a more real-time 3d environment of the whole city.
As the article states ""The key difference is that Photosynth was aimed at doing a single monument or landmark, which meant that it was scaled to a couple hundred or a thousand photographs, after which it became too slow," said Sameer Agarwal, an assistant professor at UW who worked on the project. "We can now process truly huge data sets -- the big breakthrough here was being able to match the images fast."
@acerfeld
you don't sound like you've ever programmed before, this is not the type of stuff that is simple, think about it, their processing thousands of photo's on hundreds of processors, and it's still taking hours to build the the 3D geometry, and that's no easy task, they need to figure out a person's orientation, than figure out what's in the picture, and finally construct the 3D object, just because you may be able to figure out what a building might look like from a photo does not mean that a computer is able to do it, remember, at the lowest level, a computer just see's a 1 or a 0, try making something that can build a whole city with just a yes or no game=-)