Public social media posts contain an enormous volume of information about the activities, ideas, and passing thoughts of millions of people around the world. The sheer scale is overwhelming—but it also offers an opportunity for public health researchers, who can sort through the mass of data to potentially extract important public health information.
Researchers hope it can be a valuable tool to monitor, for example, the opioid crisis. In a new study, rates of tweets about opioids and opioid misuse—analyzed and classified by a machine learning program—matched the rates of opioid overdose deaths in Pennsylvania counties and rates of opioid use measured on national surveys.
“The ultimate goal is, of course, to be able to forecast potential crises like the opioid crisis. That is what we are working towards,” wrote study author Abeed Sarker, an assistant professor of biomedical informatics at Emory University School of Medicine, in an email to Popular Science.
However, others say that these tools don’t yet have meaningful applications. “It’s not yet really useful for being able to say, here’s where we should focus intervention efforts”, says Lyle Ungar, a professor of computer and information science at the University of Pennsylvania who has done work in this space.
The study pulled in tweets posted between January 2012 and October 2015 in the state of Pennsylvania that referred to opioids. The team used over 200 keywords that represented opioid use, which included everything from randomly generated misspellings of those keywords to mirror errors common in social media use. They manually reviewed 16,000 posts to understand the way keywords were usually used in posts, and grouped 550 posts into four different categories: self-reported abuse or misuse, information sharing, unrelated, and non-English. They then trained machine learning algorithms, one of which was neural network, on those annotated posts. The neural network performed the best, and identified tweets noting opioid abuse at the same level of accuracy as a person would.
The rate of tweets indicating opioid abuse in each county correlated with rates of deaths by overdose in that county. The rate also tied up with/corresponded with county-level rates of nonmedical prescription opioid use, illicit drug use, illicit drug dependence, and illicit drug dependence or abuse reported by the National Survey on Drug Use and Health.
A number of prior studies have also automated the process of pulling opioid-related information from social media platforms. “We think this model is more robust than past models because it is more resistant to unrelated chatter — for example, if a celebrity dies from an opioid overdose, there is a lot of social media chatter about it, but that does not mean there is an increase in opioid usage at the population level,” Sarker said.
Michael Chary, a fellow in medical toxicology at Boston Children’s Hospital, worked on one of those prior studies. His work focused on opioid trends at the state level. “This paper increases the geographic resolution down to the country level,” he says. “Increasing that is important. We know from other research that there are different patterns of opioid use in urban and rural communities, which suggests policies that work for one may not work for another.”
The paper’s use of a neural network, though, makes it harder for people to know exactly how the system was sorting through and classifying tweets. “There are issues with the transparency of neural networks. It’s a limitation in general,” he says. In addition, deep convolutional neural networks—the type used in this study—are usually used on images, and have only recently started to be used with language, so they’re particularly opaque in this case, he says.
Efforts to identify the best methods of analysis should be closely followed by research that identifies the best ways to use the data that the analysis reveals, Sarker said. “We believe that we have reached a point [in] time when we should stop asking if we can use social media for public health tasks,” he said. “Interdisciplinary, collaborative research is the future and can help address current crises like the opioid crisis and prevent future crises through early detection.” Sarker’s biomedical informatics team is working with Jeanmarie Perrone, a toxicologist at the University of Pennsylvania and author on the paper, to identify methods that can help experts working directly with groups affected by opioids.
Chary, however, is less optimistic that applications will happen soon. Tweets still have to be validated as a source of epidemiological data. In addition, tweets also have to be able to be shown to be able to predict opioid use in the near future, not just track with previous survey data. “It’s not helpful to predict the past.”
Big-picture data on general opioid use also does not differentiate between types of opioids, which is important information for physicians and those targeting interventions. “That layer of data is very important. Lumping everything together into one signal glasses over that,” Chary says.
Ungar also noted that only a small sliver of the population is on Twitter, and only some people would be willing to tweet about illicit drug use. “What you get is also weird biases. What you’re measuring is how much drugs they’re using—and how willing they are to be public about.”
Data from tweets about opioid use might help researchers better understand *how* people talk about their opioid use, or the characteristics of people who struggle with it, Ungar says. However, he says that it’s still unclear how useful social media data will be.
“There’s a disconnect between finding tweets that talk about opioids and being able to use that for public health to say where we should target resources.”