Google stole data from millions of people to train AI, lawsuit says

The class action filing is going after Google for scraping 'virtually the entirety of our digital footprint.'
Close up of Google searh page screenshot
A new lawsuit alleges Google essentially illegally used the entire internet to train its AI programs. Deposit Photos

Share

Google has been hit with yet another major class action lawsuit. This time, attorneys at Clarkson Law Firm representing eight unnamed plaintiffs, including two minors, allege that the company illegally utilized data from millions of internet users to train its artificial intelligence systems. Per the California federal court filing on Tuesday, the lawsuit contends that Google (alongside parent company Alphabet, Inc. and its AI subsidiary DeepMind) scraped “virtually the entirety of our footprint” including personal and professional data, photos, and copyrighted works while building AI products such as Bard.

“As part of its theft of personal data, Google illegally accessed restricted, subscription based websites to take the content of millions without permission,” the lawsuit states. According to the lawsuit, plaintiffs (identified by their initials only) posted to social media platforms like Twitter, Facebook, and TikTok. They also used Google services such as search, streaming services like Spotify and YouTube, and dating services like OkCupid. Without their consent, the suit alleges that Google trained their AI using the plaintiffs’ “skills and expertise, as reflected in [their] online contributions.” Additionally, Google’s AI systems allegedly produced verbatim quotations from a book by an author plaintiff.

[Related on PopSci+: 4 privacy concerns in the age of AI.]

Speaking with CNN on Tuesday, an attorney representing the plaintiffs contended that “Google needs to understand that ‘publicly available’ has never meant free to use for any purpose.”

In a statement provided to PopSci, managing law firm partner Ryan Clarkson wrote, “Google does not own the internet, it does not own our creative works, it does not own our expressions of our personhood, pictures of our families and children, or anything else simply because we share it online.”

Like similar lawsuits filed in recent weeks against OpenAI and Meta, the latest class action complaint accuses Google of violating the Digital Millennium Copyright Act (DMCA) alongside direct and vicarious copyright infringement. The newest filing, however, also attempts to pin the companies for invasion of privacy and “larceny/receipt of stolen property.”

According to the filing’s attorneys, Google “stole the contents of the internet—everything individuals posted, information about the individuals, personal data, medical information, and other information—all used to create their Products to generate massive profits.” While doing so, the company did not obtain the public’s consent to scrape this data for its AI products, the lawsuit states.

[Related: Radio host sues ChatGPT developer over allegedly libelous claims.]

The months following the debut of industry-altering AI programs such as OpenAI’s ChatGPT, Meta’s LLaMA, and Google Bard has reignited debates surrounding digital data ownership and privacy rights, as well as the implications such technologies could have on individuals’ livelihoods and careers. One unnamed plaintiff in the latest lawsuit, for example, believes companies such as Google scraped their “skills and expertise” to train the very products that could soon result in their “professional obsolescence.”

Although the plaintiffs remain unnamed, they include a “New York Times bestselling author,” an “actor and professor,” and a six-year-old minor. In addition to unspecified damages and financial compensation, the lawsuit seeks a temporary halt on commercial development as well as access to Google’s suite of AI systems. Earlier this month, Google confirmed it had updated its privacy policy to reflect that it uses publicly available information to train and build AI products including Bard, Cloud AI, and Google Translate.

In a statement to PopSci, Halimah DeLaine Prado, Google General Counsel wrote, “We’ve been clear for years that we use data from public sources—like information published to the open web and public datasets—to train the AI models behind services like Google Translate, responsibly and in line with our AI Principles. American law supports using public information to create new beneficial uses, and we look forward to refuting these baseless claims.”

Update July 12, 2023, 1:04 PM: A statement from Google General Counsel has been added.

 

Win the Holidays with PopSci's Gift Guides

Shopping for, well, anyone? The PopSci team’s holiday gift recommendations mean you’ll never need to buy another last-minute gift card.