Business

100,000 simulations pick Spain as World Cup favorite

A probabilistic machine-learning model has run 100,000 World Cup simulations and put Spain at the top with a 14.5% chance of winning. England and France follow at 12.4%, while Germany sits at 11.2%. The same forecast gives the United States a 78% chance to rea

For anyone who still wants a crystal ball to settle World Cup debates. the competition may soon feel less like faith and more like math. In a new data-driven approach. an algorithm that treats tournament outcomes like “loaded dice” has been used to simulate the World Cup 100. 000 times—then rank which teams are most likely to lift the trophy.

The forecast lands on Spain as the favorite, with a 14.5% probability of winning the title. England and France are close behind at 12.4% each. Germany follows at 11.2%. The numbers also show how tightly packed the top contenders are in a tournament expanded to 48 teams and five rounds in the knockout stage.

Portugal and Argentina also figure prominently in the model’s results, at 8.9% and 8.2%, respectively.

The model’s outlook for the United States carries a sharper contrast. The U.S. is projected to have a 78% chance of reaching the Round of 32, the highest probability within its group of four teams. But the same forecast shows survival gets harder once the tournament becomes “do or die” in the knockout phase. In the final at MetLife Stadium in New Jersey on July 19. the probability of a home victory for the United States is just 1%.

Under the hood, the method is built in two steps. First, statistical models and expertise from bookmakers and transfer markets are combined to estimate the strengths of teams and players. Second, a machine learning algorithm determines how best to combine those strength estimates with other information about the teams.

Each match then gets its own probabilistic forecast—essentially replacing ordinary dice (where 1 through 6 come up with equal chances) with “loaded dice” that tilt toward different likely goal totals for each side. In one example from the forecast. Mexico is estimated at an average of 1.9 goals in the opening match. while South Africa is placed at an average of 0.7. Even then, a Mexico win isn’t treated as certain. Instead. the most likely result for that matchup is a Mexico win at 65% probability. with a draw at 21% and a South Africa win at 14%.

The simulations also respect the tournament structure and rules: the official World Cup draw is used, and the model accounts for FIFA rules including the possibility of overtime and penalty shoot-outs.

What powers the strength estimates is a mix of retrospective data, market signals, and player valuation. Over the past eight years of national matches are used as a foundation for a “retrospective” estimate of team strength. A “prospective” estimate comes from quoted odds from international bookmakers, reflecting expert expectations for the upcoming tournament. Individual players are rated based on their contributions to goals at club and national levels. And current quality and future potential are reflected in expected market values. available from Transfermarkt using a “wisdom-of-the crowd” approach to estimate real-market values.

Those variables are then combined with additional inputs, including team-specific details such as FIFA rank and the number of players in the semifinals of this year’s Champions League. Country-level factors also enter the model, including GDP per capita.

A machine learning system is used to determine how relevant these features are for results in World Cup matches. A random forest is trained—made up of lots of decision trees that each capture slightly different subsets of the data—using matches from major soccer tournaments dating back to World Cup 2006. The model links team strength, market value, and other factors to the number of goals scored in World Cup matches. That goal-scoring information is what ultimately loads the dice for the simulations.

This work is part of a broader history of forecasting World Cups with data. The same team has previously collaborated to forecast major tournaments: for the 2019 Women’s World Cup. they correctly predicted the U.S. as the winner. In the 2023 Women’s World Cup and the 2022 men’s World Cup. the winners—Spain and Argentina. respectively—were not the favorites. though both were predicted as serious contenders.

The final message from the researchers is not a promise of certainty. Forecasts, they stress, are probabilities—not certainties. The program will not predict the winner with 100% confidence, but it aims to do better than older methods of guessing what comes next.

Achim Zeileis. a professor of statistics at the University of Innsbruck. is the author of the piece republished from The Conversation under a Creative Commons license. The original work lists contributions from Andreas Groll and Rouven Michels and colleagues at TU Dortmund University in Germany. Lars Magnus Hvattum at Norway’s Molde University College. and Gunther Schauberger at TU Munich. alongside Zeileis.

World Cup Spain England France Germany United States MetLife Stadium probabilistic forecast machine learning simulations random forest Transfermarkt bookmakers GDP per capita FIFA rules penalties overtime

Leave a Reply

Your email address will not be published. Required fields are marked *

Are you human? Please solve:Captcha


Secret Link