OpenAI argues it is 'impossible' to train ChatGPT without copyrighted work

2023 marked the rise of generative AI and 2024 could well be the year its makers reckon with the technology’s fallout of the industry-wide arms race. Currently, OpenAI is aggressively pushing back against recent lawsuits’ claims that its products including ChatGPT are illegally trained on copyrighted texts. What’s more, the company is making some bold legal claims as to why their programs should have access to other people’s work.

In a blog post published on January 8, OpenAI accused The New York Times of “not telling the full story” in the media company’s major copyright lawsuit filed late last month. Instead, OpenAI argues its scraping of online works falls within the purview of “fair use.” The company additionally claims that it currently collaborates with various news organizations (excluding, among others, The Times) on dataset partnerships, and dismisses any “regurgitation” of outside copyrighted material as a “rare bug” they are working to eliminate. This is attributed to “memorization” issues that can be more common when content appears multiple times within training data, such as if it can be found on “lots of different public websites.”

“The principle that training AI models is permitted as a fair use is supported by a wide range of [people and organizations],” OpenAI representatives wrote in Monday’s post, linking out to recently submitted comments from several academics, startups, and content creators to the US Copyright Office.

In a letter of support filed by Duolingo, for example, the language learning software company wrote that it believes that “Output generated by an AI trained on copyrighted materials should not automatically be considered infringing—just as a work by a human author would not be considered infringing merely because the human author had learned how to write through reading copyrighted works.” (On Monday, Duolingo confirmed to Bloomberg it has laid off approximately 10 percent of its contractors, citing its increased reliance on AI.)

On December 27, The New York Times sued both OpenAI and Microsoft—which currently utilizes the former’s GPT in products like Bing—for copyright infringement. Court documents filed by The Times claim OpenAI trained its generative technology on millions of the publication’s articles without permission or compensation. Products like ChatGPT are now allegedly used in lieu of their source material at a detriment to the media company. More readers opting for AI news summaries presumably means less readers subscribing to source outlets, argues The Times.

The New York Times lawsuit is only the latest in a string of similar filings claiming copyright infringement, including one on behalf of notable writers, as well as another for visual artists.

Meanwhile, OpenAI is lobbying government regulators over their access to copyrighted material. According to The Telegraph on January 7, a recent letter submitted by OpenAI to the UK’s House of Lords communications and digital argues access to copyrighted materials is vital to the company’s success and product relevancy.

“Because copyright today covers virtually every sort of human expression—including blog posts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials,” OpenAI wrote in the letter, while also contending that limiting training data to public domain work, “might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.” The letter states that it is part of OpenAI’s “mission to ensure that artificial general intelligence benefits all of humanity.”

Meanwhile, some critics have swiftly mocked OpenAI’s claim that its program’s existence requires the use of others’ copyrighted work. On the social media platform Bluesky, historian and author Kevin M. Kruse likened OpenAI’s strategy to selling illegally obtained items in a pawn shop.

“Rough Translation: We won’t get fabulously right if you don’t let us steal, so please don’t make stealing a crime!” AI expert Gary Marcus also posted to X on Monday.

Win the Holidays with PopSci's Gift Guides

25 enchanting images from the Wildlife Photographer of the Year People’s Choice awards 25 enchanting images from the Wildlife Photographer of the Year People’s Choice awards

Are weight-loss drugs contributing to a fall in the obesity rate? Are weight-loss drugs contributing to a fall in the obesity rate?

Google stole data from millions of people to train AI, lawsuit says Google stole data from millions of people to train AI, lawsuit says

ChatGPT can now see, hear, and talk to some users ChatGPT can now see, hear, and talk to some users

The CIA is building its version of ChatGPT The CIA is building its version of ChatGPT

AI plagiarism detectors falsely flag non-native English speakers AI plagiarism detectors falsely flag non-native English speakers

Radio host sues ChatGPT developer over allegedly libelous claims Radio host sues ChatGPT developer over allegedly libelous claims

Google’s AI contractors say they are underpaid, overworked, and ‘scared’ Google’s AI contractors say they are underpaid, overworked, and ‘scared’

ChatGPT is, scientifically speaking, not funny ChatGPT is, scientifically speaking, not funny

A version of OpenAI’s GPT-4 will be ‘teaching’ thousands of kids this fall A version of OpenAI’s GPT-4 will be ‘teaching’ thousands of kids this fall

ChatGPT’s accuracy has gotten worse, study shows ChatGPT’s accuracy has gotten worse, study shows

School district uses ChatGPT to help remove library books School district uses ChatGPT to help remove library books

The EU just took a huge step towards regulating AI The EU just took a huge step towards regulating AI

The next version of ChatGPT is live—here’s what’s new The next version of ChatGPT is live—here’s what’s new

Is ChatGPT groundbreaking? These experts say no. Is ChatGPT groundbreaking? These experts say no.

Building ChatGPT’s AI content filters devastated workers’ mental health, according to new report Building ChatGPT’s AI content filters devastated workers’ mental health, according to new report

Big Tech’s latest AI doomsday warning might be more of the same hype Big Tech’s latest AI doomsday warning might be more of the same hype

CEOs are already using ChatGPT to write their emails CEOs are already using ChatGPT to write their emails

Share

Win the Holidays with PopSci's Gift Guides