You might not think of internet oversharing as a lifesaving habit, but maybe it is. For more than a decade, epidemiologists and data scientists have scanned our search-engine queries and social-media posts with the goal of discerning who is infected, what they have, and where they live. But deriving meaning from our consultations with Dr. Google faces an ironic obstacle: For all our copious snaps, selfies, and status updates, we’re just not sharing enough to consistently forecast disease outbreaks—including the flu. Of course, influenza’s reign of terror started long before the birth of our modern social networks. A hundred years ago, the infamous “Spanish flu” spread rapidly around the world, infecting a third of the population and killing at least 50 million people. With the rapid evolution of the virus, and increasing international travel and urbanization enabling the quick spread of illnesses, a modern version of that pandemic could cause twice as many casualties, along with widespread disruption to the global supply of food, medicine, and energy. It doesn’t matter where you live or what you do. The flu could infect you. Even in the absence of Flumageddon, improving our ability to forecast the illness is vital. Influenza viruses kill up to 646,000 people worldwide every year, including as many as 56,000 people in the U.S. Americans pay as much as $5.8 billion in medical care annually to fight the pestilence. If we know when it’s coming, health agencies could push people to get vaccinated. Hospitals could plan ahead.
Augmenting official flu reports from the Centers for Disease Control and Prevention (CDC) with data harvested from the internet is another step in our online evolution. According to a 2012 Pew Research Center study, about 184 million Americans (more than half the nation’s residents) use the Web to find health-related information. These searches are like tips to a crime hotline, enabling researchers to identify suspected flu cases. In 2006, Gunther Eysenbach, associate professor of public health at the University of Toronto, found that searches for the terms “flu” or “flu symptoms” spiked a week before a jump in doctor visits. “The internet has made measurable what was previously immeasurable,” he wrote in 2006, christening the new field “infodemiology.”
In 2008, Google rolled out Flu Trends, harnessing its own big data to look for worldwide flu surges and hot spots through symptom searches in 29 countries. Google scrapped the program in 2014—because of at least one factor that researchers hadn’t counted on.
Your search history, it turns out, can be misleading. It’s impossible for data collectors to know whether you were looking up “headache and fever” for yourself, or because you heard your co-worker complaining about their kid’s symptoms. In 2007, Americans suddenly started Googling “cholera”—had a new epidemic taken hold? Nope. Oprah Winfrey had just recommended Love in the Time of Cholera for her book club. “You should have seen what happened when Brad Pitt had viral meningitis,” says Lone Simonsen, professor of epidemiology at Roskilde University.
After culling search data from public resources, researchers run them through complex algorithms. These formulas reveal patterns that investigators can then compare with whatever the CDC or other health agencies report about the sickness. If a computer-generated prediction matches reality, we know the experts are onto something.
Search queries aren’t the only vein of data that researchers mine for flu clues. Svitlana Volkova, a data scientist at the Pacific Northwest National Laboratory, looks for gems of information on Twitter. She recently verified a new deep-learning method that probes tweets for signs of the flu. In an analysis of more than 170 million tweets posted over three years, Volkova and her colleagues found their model could accurately produce three-day forecasts of flu-like illnesses at a local level. That’s much quicker than waiting for flu reports from the CDC, which lag up to two weeks behind what’s happening in the world. (Facebook says it’s not in the flu-predicting business, so for now, your sick emoji doesn’t serve a greater good.)
Social media adds more data for researchers to work with, but it still has limitations. Annoyingly, the image we present online doesn’t always match the mucus-plagued person we are at home. Michael Paul, an information scientist at the University of Colorado at Boulder, recently found that people rarely tweet about their flu-like symptoms. In fact, the researchers found that people tweet less when they’re ill. So the next time your favorite Twitter personality seems oddly quiet, it could be because they’re sick of Twitter—but it might just be that they’re sick. Paul also investigated Instagram and found that acute illness is the least-common health topic for photo posting. Not surprisingly, flu-ridden people don’t love taking selfies.
Disease detectives, including Simonsen, hope that electronic health records could augment data from our tweets and posts. Insurance-claim forms, which list ailments and how they were treated, are particularly crucial. But people are typically reluctant to share private health data with researchers.
Epidemiologists would like to calm those privacy worries. They want only the numbers, never the names. But the final call ultimately lies with individuals. The public, Simonsen says, must weigh the balances: “Privacy on one side and the need to know more on the other.” That deliberation is even more pertinent since the EU implemented the General Data Protection Regulation this year—giving people more say in how their information is used.
Adding information from an app used to log health status—just as we do with fitness trackers or diet programs—could make big data-based flu forecasts even more accurate, Simonsen says. And private companies might come around: UNICEF is working with several, including IBM, to gather data in order to improve responses to global illnesses.
Ultimately, the potential for big data to predict the next flu pandemic might depend on people around the globe all oversharing our illnesses. The more we tweet about our #flu symptoms, the more data we generate. The more we allow companies to share that data with researchers, the more accurate they can make their predictions. And all that sharing, Volkova says, “will help the world.”
This article was originally published in the Winter 2018 Danger issue of Popular Science.