Amazing Databases: The Wayback Machine

A permanent archive of the Web--some 200 million sites' worth

Amazing Databases: The Wayback Machine

Wikimedia Commons

The purpose of the Wayback Machine is to copy and store the Internet. Since the San Francisco–based nonprofit Internet Archive created the database 15 years ago, browsing software called crawlers have captured 180 billion Web page snapshots from more than 200 million sites.

Now, at four petabytes, with another 35 to 40 terabytes added every month, the Wayback Machine is the largest accessible Web archive in existence. Plug in the URL of, say, a shuttered blog, and you'll get a timeline of crawl-dates, most of which link to functional versions of the website on that day. The Wayback Machine is free, so any curious browser can use the data for historical research or to study the evolution of the Web. Researchers at the Library of Congress, for example, used the Wayback Machine to assemble a gallery of websites as they appeared on September 11th, 2001, and in the following three months.