In 2006, Netflix made its vast database of user-generated movie ratings available to the public, offering $1 million to the first team that could improve the accuracy of the company’s recommendations by 10 percent. That’s a lot of money—but Netflix could have spent much more on in-house development, with no guarantees. By 2009, the top team had its prize, and Netflix had its algorithm. Other groups took notice and are now holding their own contests, asking statisticians, computer scientists and basement hobbyists alike to mine complex data sets for solutions to some difficult problems.
HERITAGE HEALTH PRIZE
The Heritage Provider Network, a California physicians group, invites people to develop an algorithm that can analyze three years’ worth of anonymous patient records to accurately predict if and how long a given person will spend in the hospital over the next year. Such algorithms could help Heritage’s doctors spot high-risk patients who, given extra preventive care, could lower their overall health-care costs. The contest runs until April 2013.
Hoping to improve its online recommendations, the discount e-store Overstock.com released a data set of 75,000 hypothetical Overstock shopping sessions, including clicks and purchases. To win the top prize, contestants’ predictions must be at least 10 percent better than Overstock’s current recommendations model. This spring, finalists’ algorithms will run live on Overstock’s site, where they will be judged on how often the suggested products are purchased.
For 2011’s KDD Cup, Yahoo Music published 300 million user ratings of songs, albums, artists and genres and requested two algorithms: one that calculates how people would rate new songs and one that identifies songs they wouldn’t listen to or rate. A team from National Taiwan University won both, beating pros from AT&T, Northrop Grumman and Hulu. Stay tuned for 2012’s challenge, announced this winter.