Source: Wikipedia; Modifications: Jason Tetro

Foodborne illness is a constant scourge in the United States affecting about one in every six people. The Centers for Disease Control and Prevention (CDC) has worked tirelessly to understand the trends of disease. In combination, the Food and Drug Administration (FDA) has developed a clear picture of the risk factors associated with getting sick.

From an epidemiological perspective, the information is good, but not optimal. The majority of data upon which these reports are based comes from outbreaks in which a number of people head to the emergency room to report the unwelcome symptoms of diarrhea, vomiting, stomach pains, fever and in some cases, organ dysfunction. Though helpful in establishing a larger scale set of statistics and recommendations, at the individual level, the data is routinely scarce and less robust than needed. For an epidemiologist to work at his or her full potential; there needs to be a way to understand individual experiences. Unfortunately, these more than often go unreported.

The CDC has been attempting to find new ways to reduce the gaps in personal information to find the best means to both acquire data while at the same time keep people’s personal experiences private. Back in 1999, a comprehensive review showed the importance of surveys in acquiring data. However, even this seemed to be incomplete and left significant gaps.

Other researchers have tried other means to best identify cases anonymously. One such attempt was the 2004 RUsick2 forum in which the internet was used to gather data from visitors. Rather than focusing on the microbes causing disease, this program attempted to acquire information on symptoms. Recording how a person feels offered significantly more power over the painstaking process of identifying a microbial cause. The process also allowed for a faster response in the event of a real attack. The results of the RUsick2 program were good but limited by an individual’s decision to actually sit down and register.

While RUsick2 may not have been perfect, there was a baseline set for using the social web to somehow find a way to improve food safety. New ideas were subsequently tested for their ability to harness information in order to better understand health problems. In 2008, BioCaster attempted to identify outbreaks through text mining media articles on the web. Publically available Rich Site Summary (RSS) feeds were harvested and analyzed for any keywords suggesting an outbreak might be happening. The words were simple: virus, disease, outbreak, fever, infected, strain, risk, ill. When aggregated and combined with a geospatial analysis, a potential hotspot for infection could be identified.

By 2010, a plethora of digital disease mapping resources were available including Google Trends, HealthMap and ProMED mail. They all acted on the same factors – newsworthy stories could help to track down areas where an outbreak may occur. The resultant value to places such as the CDC and the WHO were immense and helped to lower the response time as well as the implementation of interventions.

Yet, there was still a missing factor: the individual. While news provided some link to the citizen, it too was an aggregate. There was little likelihood of a single case of foodborne or other mild illness appearing in the headlines. There had to be a better path to personal information without interfering with a person’s daily activities.

This past week, a team of New York City researchers may have found the way forward. But, they couldn’t do it alone. They needed some Yelp.

Yelp is an online service offering people the opportunity to write reviews of locations they have visited, including restaurants. Over the last ten years, some 57 million reviews have been written with over 132 million followers each month. It is also a free service meaning anyone can join. For the scientific team, this was the perfect spot to give their idea a try.

The process of Yelpidemiology was simple. The team looked at some 294,000 reviews written over a 9 month period and data mined for words and phrases commonly associated with foodborne infection. Though it was a shot in the dark, they were surprised to find 893 entries consistent with foodborne related illness. As they mined further, 499 were consistent with an actual infection, 468 of those were recent and only 15 had reported the incident to public health authorities.

Although the data appeared to be accurate, the team wanted to make sure they were on the right track. They contacted some 27 people and asked them about their experience. This identified 24 restaurants as potential sources for foodborne illness and eventually 3 previously unknown outbreaks. Although the number was small, it proved Yelpidemiology works.

There is little doubt the power of social media will continue to improve the ability of public health officials to quickly identify and react to outbreaks. Also, as social media is considered public in nature, there are no privacy concerns; this is information provided openly. For the CDC and others, this experimental analysis in Yelpidemiology provides a definitive route forward and may lead to more powerful information aggregating and mining technologies. For you and me, it may mean one day we will be able to head to Yelp or other website to learn whether that restaurant you always wanted to try could end up bringing joy or pain.