AI plagiarism detectors single out non-native English speakers

Amid the rapid adoption of generative AI programs, many educators have voiced concerns about students misusing the systems to ghostwrite their written assignments. It didn’t take long for multiple digital “AI detection” tools to arrive on the scene, many of which claimed to accurately parse original human writing from text authored by large language models (LLMs) such as OpenAI’s ChatGPT. But a new study indicates that such solutions may only create more headaches for both teachers and students. These AI detection tools are severely biased, the authors found, and inaccurate when it comes to non-native English speakers.

A Stanford University team led by senior author James Zou, an assistant professor of Biomedical Data Science, as well as Computer Science and Electrical Engineering, recently amassed 91 non-native English speakers’ essays written for the popular Test of English as a Second Language (TOEFL) assessment. They then fed the essays into seven GPT detector programs. According to Zou’s results, over half of the writing samples were misclassified as AI-authored, while native speaker sample detection remained nearly perfect.

“This raises a pivotal question: if AI-generated content can easily evade detection while human text is frequently misclassified, how effective are these detectors truly?” asks Zou’s team in a paper published on Monday in the journal Patterns.

The main issue stems from what’s known as “text perplexity,” which refers to a written work’s amount of creative, surprising word choices. AI programs like ChatGPT are designed to simulate “low perplexity” in order to mimic more generalized human speech patterns. Of course, this poses a potential problem for anyone who happens to use arguably more standardized, common sentence structures and word choice. “If you use common English words, the detectors will give a low perplexity score, meaning my essay is likely to be flagged as AI-generated,” said Zou in a statement. “If you use complex and fancier words, then it’s more likely to be classified as ‘human written’ by the algorithms.”

Zou’s team then went a step further to test the detection programs’ parameters by feeding those same 91 essays into ChatGPT before asking the LLM to punch-up the writing. Those more “sophisticated” edits were then thrown back through the seven detection programs—only to have many of them reclassified as written by humans.

So, while AI-generated written content often isn’t great, neither apparently are the currently available tools to identify it. “The detectors are just too unreliable at this time, and the stakes are too high for the students, to put our faith in these technologies without rigorous evaluation and significant refinements,” Zou recently argued. Regardless of his statement’s perplexity rating, it’s a sentiment that’s hard to refute.

Win the Holidays with PopSci's Gift Guides

Here’s how your Paul McCartney wannabe can learn how to play the guitar Here’s how your Paul McCartney wannabe can learn how to play the guitar

This discounted e-scooter is perfect for anyone who loves shortcuts (and hates parking) This discounted e-scooter is perfect for anyone who loves shortcuts (and hates parking)

Meet Spotify’s new AI DJ Meet Spotify’s new AI DJ

Charity scammers’ latest weapon is AI-generated art Charity scammers’ latest weapon is AI-generated art

This robot can create finger paintings based on human inputs This robot can create finger paintings based on human inputs

This fictitious news show is entirely produced by AI and deepfakes This fictitious news show is entirely produced by AI and deepfakes

A simple guide to the expansive world of artificial intelligence A simple guide to the expansive world of artificial intelligence

Netflix used AI-generated images in anime short. Artists are not having it. Netflix used AI-generated images in anime short. Artists are not having it.

The DOJ is investigating an AI tool that could be hurting families in Pennsylvania The DOJ is investigating an AI tool that could be hurting families in Pennsylvania

Is ChatGPT groundbreaking? These experts say no. Is ChatGPT groundbreaking? These experts say no.

This AI chatbot will be playing attorney in a real US court This AI chatbot will be playing attorney in a real US court

A guide to Section 230, the law that made the internet the Wild West A guide to Section 230, the law that made the internet the Wild West

A new biodegrading ‘smart’ bandage promises faster healing A new biodegrading ‘smart’ bandage promises faster healing

Meet the newest Apple emojis: a goose, a moose, and another pink heart Meet the newest Apple emojis: a goose, a moose, and another pink heart

This vibrating magnetic pill could one day help measure your guts This vibrating magnetic pill could one day help measure your guts

Why the US might be finding more unidentified flying objects Why the US might be finding more unidentified flying objects

Don’t fall for an online love scam this Valentine’s Day Don’t fall for an online love scam this Valentine’s Day

Internet Explorer will finally bid the world adieu this Valentine’s Day Internet Explorer will finally bid the world adieu this Valentine’s Day

Share

Win the Holidays with PopSci's Gift Guides