Machine learning can measure mood on social media

Mood is a unique way for researchers to try to measure the impact of natural or unnatural disasters on people. However, it’s simply impractical to ask every single person in the world how they’re feeling in the aftermath of a sweeping event.

But scientists from the Massachusetts Institute of Technology, the Chinese Academy of Sciences, and the Max Planck Institute for Human Development found a workaround. They used machine learning techniques to scan social media for sentiment shifts following the first wave of COVID-19 in 100 different countries and get real-time reads on how happy or sad the events related to the pandemic made people across the world. Think of the process as an AI-powered mood ring, but for millions of people. Their findings were published last week in the journal Nature Human Behavior.

Unsurprisingly, the researchers found that the onset of the pandemic precipitated a dramatic drop in happiness. To put that plunge into perspective, consider that on a normal week, people tend to feel the most happy on the weekend and least happy on Mondays. The drop in happiness at the start of the pandemic around March 2020 was four to five times greater than the average dip in happiness from a normal weekend to Monday. The overall change in mood due to the pandemic is greater than the mood shift previously observed in response to a natural disaster like a hurricane, or a sharp rise in temperatures. The countries that saw the biggest drops in mood were Australia, Spain, the United Kingdom, and Colombia, whereas Bahrain, Botswana, Greece, Oman, and Tunisia appeared to be the least affected by the pandemic, according to the researchers’ social media observations.

How did machines learn to rate posts by mood?

For this study, the team used social media data from Twitter and Weibo collected by the Harvard Center for Geographic Analysis Geotweet Archive and the MIT Sustainable Urbanization Lab. In total, their dataset contained 654 million geotagged posts from 10.56 million individuals during the first five months of 2020.

To teach a machine to measure mood, the researchers started by creating a sentiment index, much like a face pain scale at the doctor’s office. This sentiment index goes from 0 (very unhappy) to 100 (very happy). Every single post the team gathered from Twitter and Weibo was judged on this index. Then, researchers can aggregate the post-specific emotions into a sentiment profile for an individual, a neighborhood, a city, or a country.

Unlike with the face pain scale, individuals don’t rate their own posts or answer surveys on how happy they feel. Instead, researchers used a machine learning method to assign each post a topic and a sentiment rating.

The machine learning method in question is a natural language processing technique called BERT, or Bidirectional Encoder Representations from Transformers, which classifies the posts by topic and sentiment. (BERT was developed by engineers at Google.)

“We wanted to do this global study to compare different countries because they were hit by the pandemic at different times, and they have different cultures, different political systems, and different healthcare systems,” says Siqi Zheng, a professor at MIT. All of these factors could play into how people’s mood were influenced by the pandemic.

Because they wanted to do a multi-language analysis, they couldn’t use their previous dictionary-based approach, which they used in a 2019 study to quantify the emotional toll of air pollution in China. The dictionary approach assumes that words have connotations associated with a particular emotion. It draws from tools like LIWC (the Linguistic Inquiry and Word Count software) and emoji dictionaries. The downside to this approach is that researchers need to compile extensive lists of words, and they need to make a different list for every language they want to look at.

The advantage of using machine learning is that it’s not language-specific. Before applying this technique to the entire sample, researchers trained it on a small sample of posts and had human researchers check its work by having it predict sentiments on random posts and comparing its accuracy rates against the dictionary model.

This paper on COVID-19-related social media responses is just one outcome from a long-term project Zheng’s lab has been working on called “Global Sentiment,” which aims to use natural language processing techniques to extract information on subjective well-being from social media posts. Her lab is using this social media mood analysis to examine the responses to a variety of events including wildfires, environmental hazards, natural disasters and new policies.

“It’s a way to provide a unique angle, a different dimension for quantifying the impact of shocks,” she says. Zheng and her colleagues have put more detailed descriptions of the codes and methods used in their studies on the Global Sentiment website.