In its ongoing effort to index the world, Google adds searchability to scanned documents

Illuminating the Dark Web furryscaly (CC Licensed)

When a government agency, medical office, or another institution scans a document and uploads it to a Web site, the images are not searchable -- they contain pictures of text, not the text itself. This is the so-called "Dark Web" -- its sinister-sounding name is just a reference to how difficult it is to search.

In late October, Google started addressing this. Using optical character recognition, the search engine will now convert images to text and include the results. The process is not a straightforward one: should "O" be read as the letter or the number? Is the text in English or another language? But the search engine crawls the Web at regular intervals, decrypting the vast storehouse of information.

In April, Google started including HTML form text in search results, and has been including PDF text as well. It's an ongoing effort to make sure all data on the Internet is searchable, not just the most common text.

Want to learn more about breakthroughs in electronics, medicine, nanotech, and more?
Subscribe to Popular Science today, for less than $1 per issue!

2 Comments

What is purple imagine big dark sky? That is it very interesting dark purple sky whole is imagine. Wow!

While I find what Google is doing to be very useful, I still can't help but be creeped out by their abilities and their relentless drive to harvest every piece of information available online.

I just hope their company motto lives on and is held up.


138 years of Popular Science at your fingertips.

Innovation Challenges



Popular Science+ For iPad

Each issue has been completely reimagined for your iPad. See our amazing new vision for magazines that goes far beyond the printed page



Download Our App

Stay up to date on the latest news of the future of science and technology from your iPhone or Android phone with full articles, images and offline viewing



Follow Us On Twitter

Featuring every article from the magazine and website, plus links from around the Web. Also see our PopSci DIY feed


February 2012: The Future of Fun

Science is reinventing play, from extreme sports to gamification to ridiculous roller coasters to the playgrounds of tomorrow, and this issue is chock full of fun. Also, on a less fun note: Did global warming destroy my hometown?


circ-top-header.gif
circ-cover.gif