A debrief on Internet Archive's clash with book publishers

Internet Archive, a non-profit digital library and a massive repository of online artifacts, has been collecting mementos of the ever-expanding World Wide Web for over two decades, allowing users to revisit sites that have since been changed or deleted. But like the web, it too has evolved since its genesis, and in the aughts, it also began to offer a selection of ebooks that any internet user can check out with the creation of a free account.

That latter feature has gotten the organization in some trouble. Internet Archive was sued by a suite of four corporate publishers in 2020 over copyright controversies—with one side saying that what Internet Archive does is preservation, and the other saying that it’s piracy, since it freely distributes books as image files without compensating the author.

Last week, the ongoing case entered a new chapter as the nonprofit organization filed a motion for summary judgment, asking a federal judge to put a stop to the lawsuit, arguing that their Controlled Digital Lending program “is a lawful fair use that preserves traditional library lending in the digital world” since “each book loaned via CDL has already been bought and paid for.” On Friday, Creative Commons issued a statement supporting Internet Archive’s motion.

The public libraries in your local neighborhood usually partner with platforms like Overdrive, Libby, Hoopla, and Cloud Library to provide digital copies of books that they can loan out. But these library ebooks are part of a surprisingly complex and lucrative financial structure (Daniel A. Gross’ piece in The New Yorker deep dives into the business behind library ebooks). Additionally, users must login to these services with an existing library card number.

Internet Archive works a little differently. Anyone can create a free account and start browsing materials like books, movies, software, music, websites and more.

The site’s beginning dates back to 1996, when Internet Archive was first established as a way to maintain “a historical record of the World Wide Web.” Its mission is to “provide Universal Access to All Knowledge,” including to researchers, historians, scholars, people with print disabilities like low vision and dyslexia, and the general public. One of its popular tools, The Wayback Machine, has offered nostalgic glimpses of what the publicly available web used to look like before sites went offline or were reconstructed. And it’s no easy task, as content on the internet today balloons at an exponential rate.

In 2006, Internet Archive started a program for digitizing books both under copyright and in the public domain. It works with a range of global partners, including other libraries, to scan materials onto its site (Cornell University made a handy guide on what works fall under copyright vs. the public domain). For copyrighted books, Internet Archive owns the physical books that they created the digital copies from and limits their circulation by allowing only one person to borrow a title at a time.

Book publishers, namely Hachette Book Group, HarperCollins, John Wiley Sons, and Penguin Random House, were not keen on this practice, and they have been seeking financial damages for the 127 books shared under copyright. Vox estimated that if the publishers win, Internet Archive would have to pay $19 million, which is about “one year of operating revenue.”

In the most recent filings, the publishers accused Internet Archive of amassing “a collection of more than three million unauthorized in-copyright ebooks – including more than 33,000 of the Publishers’ commercially available titles – without obtaining licenses to do so or paying the rightsholders a cent for exploiting their works. Anybody in the world with an internet connection can instantaneously access these stolen works via IA’s interrelated archive.org and openlibrary.org websites.”

In its defense, Internet Archive, which is being represented by the Electronic Frontier Foundation, says that “libraries have been practicing CDL in one form or another for more than a decade,” and that Internet Archive lends its digitized books on an “owned-to-loaned basis, backstopped by strong technical protections to enforce lending limits.”

“CDL makes it easier for patrons who live far from a brick-and-mortar library, or who have print disabilities, to access books. It supports research, scholarship, and cultural participation in myriad ways,” EFF and Internet Archive wrote in a memorandum.

Additionally, Internet Archive founder Brewster Kahle told Vox in 2020 that “when nonprofit libraries have been sued in the past for helping their patrons access their collections, courts have ruled that they were engaging in fair use, as in the HathiTrust case.” A similar ruling was issued for a lawsuit against Google Books.

Interested in checking out Internet Archive further? Head over to their site to peruse their catalog of content.