When it comes to books, those in Russian and Chinese have the narrowest range of emotion, while books in English have the greatest. Even more emotionally wide-ranging than English books are Portuguese and Spanish tweets, and music lyrics in English. All these insights and more come from a big, new study of 10 diverse languages: English, French, German, Spanish, Chinese, Korean, Arabic, Indonesian, Russian and Brazilian Portuguese. In part, the researchers in charge of the project wanted to test whether human languages tend to be positive or negative. Languages are positive! they found… while finding plenty of other cool stuff, too.
How was a team of American engineers able to rate the emotions of world languages? For one thing, they used a lot of automation. The team used an algorithm to analyze different kinds of writing, including website text, books, tweets, TV subtitles and the New York Times (English only). The algorithm gave the researchers the 5,000 to 10,000 most commonly used words in each language and each body of writing. Then, the team paid native speakers to rate how positive each of those words were on a 10-point scale. Those measurements gave the research team an overall measurement of the “positivity” of different languages.
All the languages and writings they studied skewed more positive than neutral. None skewed negative. Nevertheless, some were more positive than others. Spanish Twitter is among the cheeriest, for example, while Chinese books are the closest to neutral.
There are many more gems to find in the paper the team wrote and posted to the paper-database arXiv. There’s a whole section in which the researchers analyzed the positivity in different parts of classic novels, such as Moby Dick, Anna Karenina and The Count of Monte Cristo. The Physics arXiv Blog, which first turned us onto this cool study, has more about the book analysis. Here’s one strange thing I learned from the paper: “kkkkkkkkkkkkkkkkkkkkk” represents laughter in Brazilian Portuguese. As a word, it obviously scores high on Brazilian Portuguese speakers’ positivity scale. I found it curiouser that it was pretty frequently used on Twitter, and at that length, too. Who can afford all those characters?