Now that crowd-favourite and big data world’s darling Germany has been booted out of FIFA World Cup 2018, leading big data giants including Goldman Sachs, Alteryx, UBS and Germany’s Commerzbank, among others who put their reputation on line found out that their machine learning-simulated predictions shattered far too early. First Germany crashed out followed by the fan favourite Spain. Goldman Sachs used machine learning to run 200,000 models, harvesting the data on the team and individual player attributes, to drum up the accurate match scores.
The company further simulated one million possible variations of the tournament in order to calculate the probability of advancement for each squad and placed Brazil and Germany in the top spot. The two leading contenders from other companies were Spain and Germany; even Dutch bank ING had laid out hopes for Sergio Ramos-led Spain who got knocked out by Russia.
Practically every leading company or bank worth its salt analysed historical data, ran simulations and used state-of-the-art statistical techniques to find out the two major contenders. However, all of these seem to have been proved wrong in the case of football, which has turned out to be an unpredictable tournament. Germany and Spain’s exit and the unpredictability of the game has shone the spotlight on big data and machine learning techniques which have now come under the fire.
But our assertion is that machine learning may not necessarily have all the answers. And most importantly, it is far too early to dismiss big data predictions. In the last few years, tech firms have developed cutting-edge machine learning techniques that are more powering than traditional statistical methods. Technical University of Dortmund’s Andreas Groll reportedly simulated the whole tournament 100,000 times and deployed a random forest approach to come up with a winner. His predictions were that Germany is likely to win the World Cup with Spain being the other big contender. Groll and his team’s model titled towards Spain and the team also provided survival probabilities for all teams and at all tournament stages along with a probable tournament outcome. (To read Groll’s paper, click here.)
Why It Is Too Early To Dismiss ML Predictions For FIFA 2018
Just like football has proved to be an unpredictable game, machine learning models too only generate a probability of the outcome from a statistical standpoint. Secondly, metrics deployed for the probabilistic forecasts (in this case Germany, Brazil and Spain with other top contenders like Belgium and England) are based on rankings from FIFA, bookie data and other analyst firms. However, besides these rankings, models usually exclude factors such as coach’s nationality, player profile, injuries, country’s population and other important parameters. Also, Groll’s model relied on a game-by-game approach which produced a different result as opposed to a tournament analysis, where Brazil emerged as a favourite.
Louis Rosenberg-founded Unanimous A.I, the company behind Oscars 2018 and NCAA tournament predictions also relied on its “swarm intelligence” technology — the power of collective human intelligence as opposed to deep learning techniques to predict the World Cup 2018 top contenders and we have to say, their predictions were not far other leading tech companies with Germany, Brazil, Spain and France being the top contenders. Swarm intelligence, as the name indicates harvests fans opinions to attach confidence to results. Another company, Kickoff.ai relied on past data of national teams to predict outcomes of football matches.
A key question in football predictions is that models do rely on past data, but how old should the data be, or as Kickoff’s team explains — how “far” in the past does the data need to go? For example, Kickoff’s team used Bayesian inference for probabilistic prediction.
Artificial Intelligence as a technology is yet to mature and programmers rely on quantifiable data to make observations and predictions, but football is a sport that is dominated by bookies and it is quite common to find data being manipulated. Also, bookie data doesn’t give a true assessment of the game.
Another key point is that football is not just about player statistics — other factors which cannot be included in the model such as the team dynamics, emotions and fans sentiments — also come to play. There are several other factors that can also be included in conjunction such as player’s body language and expressions. The current FIFA World Cup 2018 outcomes indicate that models should incorporate wider parameter to strengthen the odds. Other factors that can be included in a model number of goals a team makes in a match.