Building numbers come in a huge variety of shapes, colors and sizes, making them difficult to model for machine vision.
Building numbers come in a huge variety of shapes, colors and sizes, making them difficult to model for machine vision. Netzer et al

Extrapolating numbers and letters from digital images is still a tough task, even for the best computer programmers. But it would be handy to extract business names, or graffiti, or an address from pictures that are already stored online. Aiming to make its Street View service even more accurate, Google would like to extract your house number from its own Street View photo cache.

Say what you will about Street View (and Helicopter View and Amazon View and etc.) — beyond the novelty factor, the images are full of potentially useful data. Using street numbers on the side of a house or business could make navigation programs more accurate, and help motorists or pedestrians find the right door by providing a preview on the Internet or a mobile device. But while handwriting algorithms are pretty advanced, software systems are still limited in their ability to extract information from images. Factors like blurring, lighting and other distortions can be a problem.

To improve matters, researchers at Google and Stanford devised a new feature-learning algorithm for a set of street numbers captured from a Street View database. They used 600,000 images from various countries and extracted the house numbers using a basic visual algorithm. Then the team used Amazon’s Mechanical Turk system to verify the arrangement of the numbers. The result was two sets of images: One with house number images as they appeared, and one with house numbers all resized to the same resolution.

Initially, traditional handcrafted visual learning algorithms didn’t work very well to extract the numbers. Next, the Google-Stanford team tried feature learning algorithms, which use various sets of parameters to learn recognition patterns. The new feature learning methods worked much better than the regular visual learning method: One of the algorithms (a K-means-based feature learning system) achieved 90 percent accuracy, compared to 98 percent for a human.

The system still needs improvement, but it could be useful for extracting number data from billions of images, the researchers say. Ultimately, this could make Street View a lot more accurate. Without a house-number-based view, an address in Street View is a default panorama, which might not actually be the address you want. Type in “30 Rockefeller Plaza,” for instance, and the first thing you see is a chocolatier next to the 30 Rock observation deck. You have to click and drag to see the NBC building.

“With the house number-based view angle, the user will be led to the desired address immediately, without any further interaction needed,” the paper authors explain.

[I Programmer via Slashdot]