How Computer Analysis Uncovered J. K. Rowling’s Secret Novel
Or, how your four-grams may be undermining your anonymous erotica-writing career
The Cuckoo’s Calling, a detective novel by first-time author Robert Gailbraith, just got solved in a big way. This weekend, the U.K.’s The Times reported The Cuckoo’s Calling was actually written by J. K. Rowling, the creator of Harry Potter. Rowling even admitted to writing the novel after The Times asked her directly.
Among the evidence The Times presented to Rowling were analyses from two university professors who had written computer programs to uncover who authored disputed texts. After all, every writer has her habits. One obvious one is the use of regional words—a car “boot” versus a “trunk,” for example—but others are much more subtle and unconscious. It’s totally creepy, but cool, that a computer program is able to pick them out.
The Times originally asked the programmers to check The Cuckoo’s Calling out after receiving an anonymous tip that Rowling might be the book’s true author. The Times reporter, Alexi Mostrous, didn’t initially let the professors know why he wanted them to compare The Cuckoo’s Calling to several other novels.
So what habits give authors away? One of the analyzers, Patrick Juola of Duquesne University in Pittsburgh, has written a detailed blog post about how his program works. The full post is a great read, but here are the highlights.
Basically, Juola got a digital copy of The Cuckoo’s Calling, plus digital copies of novels by Rowling and three well-known authors of mystery novels. He then ran a series of analyses that told him which of the authors the habits in The Cuckoo’s Calling matched best. Each analysis looked at a different “habit” in the books:
- Juola looked at the distribution of word lengths in each book. That is, he got a bunch of numbers like, “X percent of the words in this book are exactly Y letters long.”
- Juola looked at the 100 most common words in each book.
- He looked at pairs of words that often appeared together.
- He looked at groups of four characters that appear in a string. Any four characters in a string may do, including letters, spaces and grammatical marks. Now, I don’t know of any writers that ever think about character strings in their writing, but, Juola said, other studies have proven four-character strings, called four-grams, are strong indicators of authorship.
Juola’s overall analysis isn’t able to prove authorship, he said. Some of the individual tests found authors other than Rowling were the best match. Nevertheless, Rowling came up the most consistently. Juola called his work “suggestive” or “indicative” that Rowling wrote The Cuckoo’s Calling. The smoking gun came from Rowling’s confession, which Juola’s analysis surely helped convince her to give.
The distinction matters because linguists use tools like Juola’s and others’ to determine who actually wrote everything from historical texts by long-dead authors to contested documents in modern court cases. In those cases, it can be a lot harder to get a ready, reliable confession.