Analysis of Mark Twain's 1876 letter, compared to my email from last week.
Analysis of Mark Twain's 1876 letter, compared to my email from last week. Screenshot
SHARE
httpswww.popsci.comsitespopsci.comfilesscreen_shot_2015-07-20_at_3.09.23_pm.png
A sample analysis using IBM’s Tone Analyzer. IBM Watson Developer Cloud

When I woke up this morning, I never thought there would be a way to stack my letter-writing skills up against one of the greatest authors of all time. Yet, as is the way of technology, I was proved wrong. IBM’s web innovation hub, the Watson Developer Cloud, came out with a demo of their new tool: the Tone Analyzer, made to detect “emotional tones, social propensities, and writing styles in written communication.” IBM wants to make us better writers–I can get down with that. So I took it for a spin.

As a stereotypical (read: neurotic) writer, my first thought was to see how my casual writing stacked up against that of a literary heavy-hitter–someone like Mark Twain. So I found a random letter Twain wrote in 1876, and put it into the machine. I did the same with my side of an email I sent to one of my editors at PopSci last week.

The results were surprisingly close (and I know my writing skills are objectively nowhere close to Twain’s). The Tone Analyzer breaks writing down into three tones: emotional tone, social tone, and writing tone. Within those categories, Twain scored 4 percent emotional, 88 percent social and 7 percent writing. I went 2 percent emotional, 90 percent social and 6 percent writing. (The unaccounted percentage points seem to be proper nouns and other words the computer couldn’t understand.) It also picked other notes out of our writing, like words it deemed cheerful. Twain had 4 cheerful words (dear, pleasure, thanks and pleasant), and I had only 1 (hope).

httpswww.popsci.comsitespopsci.comfileswriting-compared.jpg
Analysis of Mark Twain’s 1876 letter, compared to my email from last week. Screenshot

So what does this say? Twain had more tone by 1 percentage point? He was four times cheerier than I am? No, the similarities in our results actually shows the underlying problem with this software: there’s no context from word to word. While testing that program, the words “I’m angry” gave the same exact “anger” readout as “not angry,” because the computer isn’t reading any more than one word at a time. This makes the software great at finding individual words that could trigger an unwanted reaction, but blind to the greater meaning of those words. If automated, or used blindly, this could cause trouble.

For instance, if you grew up with a basic word processor on a computer (I cut my teeth on Word 97), you can remember the initial joy of right-clicking on a word and replacing it with a more-important sounding synonym. The word “better” became “surpassing,” and soon your sentence bore no resemblance to what you initially wrote. That’s the same kind of idea here, and in fact, Tone Analyzer gives you that same option to replace words. My sentence “Sorry, brain in vacation mode” became “Grim, intellect successful vacation mode.”

Tone Analyzer isn’t the only, and definitely not the first, writing tool to fall flat, and it’s important to note that its still in development. It falls flatter than most given its large claim to effectively recognize and correct good writing. I don’t think we’re measuring it against an unfair standard, either. If a program wants to pick out social cues and writing style, a sample larger than one word at a time is necessary.

This issue is endemic to most pieces of writing software, and why our email applications don’t have little popups that say “Are you upset? You might not want to send this email.” It’s because our software can’t understand our human generated writing 100 percent of the time, although some have come extraordinarily close.

Automated Insights, which has partnered with the Associated Press to generate sports stories, reverses the use of software understanding language. With a set of inputs, like sports scores, the computer can use a bank of adjectives and verbs to describe the data. This is backwards from analyzing written material, but the same idea holds; software with guidelines to what makes readable and proper writing.

Other tools on the web with more limited objectives are the Hemingway editor, which focuses on brevity and reducing use of the passive voice, and the Text Content Analyzer Tool.

But IBM’s tool isn’t terrible; it reads your text word-by-word, and identifies common connotations with those words. This could be undoubtedly helpful, if integrated into a native email application or professional messaging service. However, in a modern world grasping at the fringes of artificial intelligence, this software only holds a mirror to how far away most programs really are from achieving that gold standard.