AI can help predict March Madness upsets, but it's far from perfect

“Beware the Ides of March.” Yes, it’s finally that time of year again: when the emperors of college basketball must watch their backs, lest the lowly bottom seeds of the tournament strike.

Before March 15, millions around the world filled out their March Madness brackets. In 2017, ESPN received a record 18.8 million brackets.

The first step to a perfect bracket is correctly choosing the first round. Unfortunately, most of us can’t predict the future. Last year, only 164 of the submitted brackets were perfect through the first round—less than 0.001 percent.

18.8 million brackets submitted.

164 are perfect after Round 1.

Here's to overachieving. #perfectbracketwatch pic.twitter.com/TGwZNCzSnW
— ESPN Fantasy Sports (@ESPNFantasy) March 18, 2017

Many brackets are busted when a lower-seeded team upsets the favored higher seed. Since the field expanded to 64 teams in 1985, at least eight upsets occur on average each year. If you want to win your bracket pool, you better pick at least a few upsets.

We’re two math Ph.D. candidates at the Ohio State University who have a passion for data science and basketball. This year, we decided it would be fun to build a computer program that uses a mathematical approach to predict first-round upsets. If we’re right, a bracket picked using our program should perform better through the first round than the average bracket.

Fallible humans

It’s not easy to identify which of the first-round games will result in an upset.

Say you have to decide between the No. 10 seed and the No. 7 seed. The No. 10 seed has pulled off upsets in its past three tournament appearances, once even making the Final Four. The No. 7 seed is a team that’s received little to no national coverage; the casual fan has probably never heard of them. Which would you choose?

If you chose the No. 10 seed in 2017, you would have gone with Virginia Commonwealth University over Saint Mary’s of California—and you would have been wrong. Thanks to a decision-making fallacy called recency bias, humans can be tricked into to using their most recent observations to make a decision.

Recency bias is just one type of bias that can infiltrate someone’s picking process, but there are many others. Maybe you’re biased toward your home team, or maybe you identify with a player and desperately want him or her to succeed. All of this influences your bracket in a potentially negative way. Even seasoned professionals fall into these traps.

Modeling upsets

Machine learning can defend against these pitfalls.

In machine learning, statisticians, mathematicians and computer scientists train a machine to make predictions by letting it “learn” from past data. This approach has been used in many diverse fields, including marketing, medicine, and sports.

Machine learning techniques can be likened to a black box. First, you feed the algorithm past data, essentially setting the dials on the black box. Once the settings are calibrated, the algorithm can read in new data, compare it to past data and then spit out its predictions.

March Madness upset predictions — A black box view of machine learning algorithms. Matthew Osborne, CC BY-SA

In machine learning, there are a variety of black boxes available. For our March Madness project, the ones we wanted are known as classification algorithms. These help us determine whether or not a game should be classified as an upset, either by providing the probability of an upset or by explicitly classifying a game as one.

Our program uses a number of popular classification algorithms, including logistic regression, random forest models and k-nearest neighbors. Each method is like a different “brand” of the same machine; they work as differently under the hood as Fords and Toyotas, but perform the same classification job. Each algorithm, or box, has its own predictions about the probability of an upset.

We used the statistics of all 2001 to 2017 first-round teams to set the dials on our black boxes. When we tested one of our algorithms with the 2017 first-round data, it had about a 75 percent success rate. This gives us confidence that analyzing past data, rather than just trusting our gut, can lead to more accurate predictions of upsets, and thus better overall brackets.

Chances of an upset

For March Madness 2018, three machine learning models attempt to predict whether there will be an upset in the first round. The percentages are the probability that the match up results in a lower-seeded team beating a higher-seeded team.

SEEDING MATCH UP	HIGHER SEED	LOWER SEED	MODEL A	MODEL B	MODEL C	ACTUAL GAME OUTCOME
# 1 vs # 16	Virginia	UMBC	2.81%	10%	Not an Upset	Upset
# 8 vs # 9	Creighton	Kansas State	30.69%	10%	Not an Upset	Not an Upset
# 5 vs # 12	Kentucky	Davidson	26.07%	60%	Upset	Not an Upset
# 4 vs # 13	Arizona	Buffalo	23.46%	60%	Not an Upset	Upset
# 6 vs # 11	Miami (FL)	Loyola-Chicago	31.65%	10%	Not an Upset	Upset
# 3 vs # 14	Tennessee	Wright State	11.03%	0%	Not an Upset	Not an Upset
#7 vs # 10	Nevada	Texas	40.76%	70%	Not an Upset	Not an Upset
# 2 vs # 15	Cincinnati	Georgia State	9.96%	50%	Not an Upset	Not an Upset
# 1 vs # 16	Xavier	Texas Southern	8.17%	0%	Not an Upset	Not an Upset
# 8 vs # 9	Missouri	Florida State	56.17%	40%	Upset	Upset
# 5 vs # 12	Ohio State	South Dakota State	17.86%	10%	Upset	Not an Upset
# 4 vs # 13	Gonzaga	UNC Greensboro	11.91%	40%	Not an Upset	Not an Upset
# 6 vs # 11	Houston	San Diego State	33.6%	50%	Upset	Not an Upset
# 3 vs # 14	Michigan	Montana	4.91%	20%	Not an Upset	Not an Upset
#7 vs # 10	Texas A&M	Providence	42.96%	10%	Not an Upset	Not an Upset
# 2 vs # 15	UNC	Lipscomb	6.38%	10%	Not an Upset	Not an Upset
# 1 vs # 16	Villanova	Radford	2.58%	40%	Not an Upset	Not an Upset
# 8 vs # 9	Virginia Tech	Alabama	42.82%	40%	Upset	Upset
# 5 vs # 12	WVU	Murray State	9.88%	10%	Not an Upset	Not an Upset
# 4 vs # 13	Wichita State	Marshall	18.59%	20%	Not an Upset	Upset
# 6 vs # 11	Florida	St. Bonaventure	14.53%	40%	Not an Upset	Not an Upset
# 3 vs # 14	Texas Tech	Stephen F. Austin	7.97%	0%	Not an Upset	Not an Upset
#7 vs # 10	Arkansas	Butler	33.29%	20%	Not an Upset	Upset
# 2 vs # 15	Purdue	Cal State Fullerton	4.07%	0%	Not an Upset	Not an Upset
# 1 vs # 16	Kansas	Upenn	5.91%	0%	Not an Upset	Not an Upset
# 8 vs # 9	Seton Hall	NC State	36.80%	40%	Not an Upset	Not an Upset
# 5 vs # 12	Clemson	New Mexico	22.93%	40%	Not an Upset	Not an Upset
# 4 vs # 13	Auburn	Charleston	16.51%	30%	Not an Upset	Not an Upset
# 6 vs # 11	TCU	Syracuse	28.83%	10%	Not an Upset	Upset
# 3 vs # 14	Michigan State	Bucknell	7.39%	20%	Not an Upset	Not an Upset
#7 vs # 10	Rhode Island	Oklahoma	59%	40%	Upset	Not an Upset
# 2 vs # 15	Duke	Iona	5.35%	10%	Not an Upset	Not an Upset

Model A: Logistic Regression Upset Probability
Model B: Random Forest Upset Probability
Model C: K Nearest Neighbors Classification

Chart: Matthew Osborne and Kevin Nowland, The Conversation, CC-BY-ND Get the data

What advantages do these boxes have over human intuition? For one, the machines can identify patterns in all of the 2001-2017 data in a matter of seconds. What’s more, since the machines rely only on data, they may be less likely to fall for human psychological biases.

That’s not to say that machine learning will give us perfect brackets. Even though the box bypasses human bias, it’s not immune to error. Results depend on past data. For example, if a No. 1 seed were to lose in the first round, our model would not likely predict it, ~~because that has never happened before~~.

(Editor’s note: LOL)

Additionally, machine learning algorithms work best with thousands or even millions of examples. Only 544 first-round March Madness games have been played since 2001, so our algorithms will not correctly call every upset. Echoing basketball expert Jalen Rose, our output should be used as a tool in conjunction with your expert knowledge—and luck!—to choose the correct games.

Machine learning madness?

We’re not the first people to apply machine learning to March Madness and we won’t be the last. In fact, machine learning techniques may soon be necessary to make your bracket competitive.

You don’t need a degree in mathematics to use machine learning—although it helps us. Soon, machine learning may be more accessible than ever. Those interested can take a look at our models online. Feel free to explore our algorithms and even come up with a better approach yourself.

Matthew Osborne is a Ph.D Candidate in Mathematics, and Kevin Nowland is a Ph.D Candidate in Mathematics at The Ohio State University. This article was originally featured on The Conversation.

Fallible humans

Modeling upsets

Chances of an upset

Machine learning madness?

14 soaring and stunning images from 2024 Bird Photographer of the Year awards 14 soaring and stunning images from 2024 Bird Photographer of the Year awards

Here’s how to restore the glory days of Twitter Here’s how to restore the glory days of Twitter

Want ethical AI? Hand the keys to middle schoolers. Want ethical AI? Hand the keys to middle schoolers.

AI can help fight climate change—but it can also make it worse AI can help fight climate change—but it can also make it worse

An AI called Dragonfly is helping design faster-charging batteries An AI called Dragonfly is helping design faster-charging batteries

Meta is pivoting to video with its new AI generator Meta is pivoting to video with its new AI generator

Microsoft is hoping its algorithms can help farmers—and the planet Microsoft is hoping its algorithms can help farmers—and the planet

The ‘Doomsday’ glacier is fracturing and changing. AI can help us understand how. The ‘Doomsday’ glacier is fracturing and changing. AI can help us understand how.

Can AI escape our control and destroy us? Can AI escape our control and destroy us?

Combining AI and traditional methods can help us predict air quality Combining AI and traditional methods can help us predict air quality

Social media posts might be able to help researchers understand and predict opioid overdoses Social media posts might be able to help researchers understand and predict opioid overdoses

How John Deere’s tech evolved from 19th-century plows to AI and autonomy How John Deere’s tech evolved from 19th-century plows to AI and autonomy

AI’s climate consequences are often overlooked AI’s climate consequences are often overlooked

How a US intelligence program created a team of ‘Superforecasters’ How a US intelligence program created a team of ‘Superforecasters’

Here’s what we learned from today’s Congressional committee meetings on social media Here’s what we learned from today’s Congressional committee meetings on social media

Amazon says its football AI can predict blitzes Amazon says its football AI can predict blitzes

How AI’s data-crunching-power can help demystify the cosmos How AI’s data-crunching-power can help demystify the cosmos

Peeing for astronauts, a DIY breast reconstruction, and other tales from the field Peeing for astronauts, a DIY breast reconstruction, and other tales from the field

Share

Fallible humans

Modeling upsets

Chances of an upset

Machine learning madness?