# 2018 FIFA World Cup Football / Soccer Simulation

GROUPS:

This June 14, 2018 starts the FIFA World Cup in Russia. After reading El País predition (see (1) and (2)), I wondered if I could build my own simulation using Wolfram Language. So here it is what I tried. Any new ideas on how to improve it are welcome, and let the best win!

# Introduction

I will build this simulation by blocks:

1. Base data
2. Simulation of a match
3. Simulation of the groups phase
4. Simulation of the second phase
5. All together

# 1. Base data

## 1.1. Rankings

As a base for this model, I will use the FIFA ranking. We could also do a test using other rankings (like the one generated by El País (2)).

I've downloaded the ranking from FIFA, and after using Interpreter["Country"] on all the names, I could get most of the countries properly parsed. Of course, we all know UK is not united for football, so I used Interpreter["AdministrativeDivision"] for the UK home nations, and a few other cases needed special threatment (like the case of Tahiti, which is not a proper country, so I did align it with Tahiti Islands, or Congo vs Congo DR, which need special disambiguation). Here I will attach the result of that ranking, so we should just load it:

rankingPoints = Dispatch[Get["rankingFifaJune7.m"]]


## 1.2. Participants and Groups

Using the previous alignment, I took the list from FIFA and converted into a simple list of countries.

countries = {Entity["Country", "Russia"],
Entity["Country", "SaudiArabia"], Entity["Country", "Egypt"],
Entity["Country", "Uruguay"], Entity["Country", "Portugal"],
Entity["Country", "Spain"], Entity["Country", "Morocco"],
Entity["Country", "Iran"], Entity["Country", "France"],
Entity["Country", "Australia"], Entity["Country", "Peru"],
Entity["Country", "Denmark"], Entity["Country", "Argentina"],
Entity["Country", "Iceland"], Entity["Country", "Croatia"],
Entity["Country", "Nigeria"], Entity["Country", "Brazil"],
Entity["Country", "Switzerland"], Entity["Country", "CostaRica"],
Entity["Country", "Serbia"], Entity["Country", "Germany"],
Entity["Country", "Mexico"], Entity["Country", "Sweden"],
Entity["Country", "SouthKorea"], Entity["Country", "Belgium"],
Entity["Country", "Panama"], Entity["Country", "Tunisia"],
Entity["Country", "Poland"], Entity["Country", "Senegal"],
Entity["Country", "Colombia"], Entity["Country", "Japan"]}


Given that it is sorted by groups, we could just use Partition and get the teams per group:

teamsByGroup = AssociationThread[CharacterRange["A", "H"] -> Partition[countries, 4]]


# 2. Simulation of a match

This is probably the hardest part of this problem. How to predict the outcome of a match? If it were easy, we would not even play! Well, I will use the same prediction used by other sources (like this paper), which is based on a Poisson distribution using the ration between the FIFA rankings. I'm adding also another factor that could be used to tweak more the probability (for example, Russia may not have a good ranking, but it is playing at home, that may be important!).

When it is not allowed to be a draw, instead of doing only an extra time and round of penalties if needed, I just called again the basic results to get a new result until no the tie is broken. To get an actual result we may want to call only once the Poisson distribution (for the extra time), and then maybe a binomial distribution for penalties.

getPossibleResult[weight1_, weight2_, lambda1_, lambda2_] :=
{RandomVariate[PoissonDistribution[lambda1 weight1/weight2]], RandomVariate[PoissonDistribution[lambda2 weight2/weight1]]};

simulateMatch[weight1_, weight2_, lambda1_, lambda2_, drawQ_] := Module[{result},
result = getPossibleResult[weight1, weight2, lambda1, lambda2];
If[!drawQ,
While[SameQ @@ result,
result = getPossibleResult[weight1, weight2, lambda1, lambda2]
];
];
result
];

simulatedMatchResults[match["Groups"][team1_, team2_]] := simulateMatch[team1 /. rankingPoints, team2 /. rankingPoints, 1, 1, True];
simulatedMatchResults[match["Knockout"][team1_, team2_]] := simulateMatch[team1 /. rankingPoints, team2 /. rankingPoints, 1, 1, False];


In this first model I'm using 1 for the lambda values, but we could generate another dispatch list to modify for example Russia factor for playing as local, or adding more strengh to our favorite team, etc. Of course, if you change too much the values, remember not to use the result for doing any bet!

# 3. Simulation of the groups phase

To simulate a each group, we will play each match, and then we will compute the full table of results sorted by points, goals difference, goals done, and goals against.

simulateGroup[groupsTeam_] :=
Module[{matches = DeleteDuplicatesBy[Permutations[groupsTeam, {2}], Sort], matchResults, resultsTable, resultsLookup},
resultsLookup[_, _] = 0;
matchResults = {##, simulatedMatchResults[match["Groups"][##]]} & @@@ matches;
Function[{team1, team2, result},
Switch[result,
{a_, b_} /; a > b,
resultsLookup[team1, "WINS"] = resultsLookup[team1, "WINS"] + 1;
resultsLookup[team2, "LOSTS"] = resultsLookup[team2, "LOSTS"] + 1;
resultsLookup[team1, "POINTS"] = resultsLookup[team1, "POINTS"] + 3;
,
{a_, b_} /; a < b,
resultsLookup[team2, "WINS"] = resultsLookup[team2, "WINS"] + 1;
resultsLookup[team1, "LOSTS"] = resultsLookup[team1, "LOSTS"] + 1;
resultsLookup[team2, "POINTS"] = resultsLookup[team2, "POINTS"] + 3;
,
_
,
resultsLookup[team1, "DRAWS"] = resultsLookup[team1, "DRAWS"] + 1;
resultsLookup[team2, "DRAWS"] = resultsLookup[team2, "DRAWS"] + 1;
resultsLookup[team1, "POINTS"] = resultsLookup[team1, "POINTS"] + 1;
resultsLookup[team2, "POINTS"] =  resultsLookup[team2, "POINTS"] + 1;
];
resultsLookup[team1, "GOL+"] = resultsLookup[team1, "GOL+"] + result[[1]];
resultsLookup[team2, "GOL+"] = resultsLookup[team2, "GOL+"] + result[[2]];
resultsLookup[team1, "GOL-"] = resultsLookup[team1, "GOL-"] + result[[2]];
resultsLookup[team2, "GOL-"] = resultsLookup[team2, "GOL-"] + result[[1]];
] @@@ matchResults;
resultsTable = Table[
{
t,
resultsLookup[t, "POINTS"],
resultsLookup[t, "WINS"],
resultsLookup[t, "LOSTS"],
resultsLookup[t, "DRAWS"],
resultsLookup[t, "GOL+"],
resultsLookup[t, "GOL-"],
resultsLookup[t, "GOL+"] - resultsLookup[t, "GOL-"]
}, {t, groupsTeam}];
SortBy[resultsTable, {-#[[2]], -#[[-1]], -#[[-3]], #[[-2]]} &]
];

selectClassified[groupResultTable_] := groupResultTable[[;;2, 1]];

simulateAllGroups[groups_] := simulateGroup /@ groups;

selectAllClassified[groupsResults_] := AssociationMap[List["1"<>#[[1]]->#[[2, 1]],"2"<>#[[1]]->#[[2, 2]]] &, selectClassified /@ groupsResults]


# 4. Simulation of the second phase

## 4.1. Matches schedule

First step here is to get the match schedule. Who plays against who in each knockout round? Well, that's easy to get from FIFA, and as a way to store it, I will do a list of rules of matchnumber -> {team1, team2}, where team1 and team2 will be strings with the form "1"|"2" ~~ group or "W"|"L" ~~ matchnumber.

roundOf16 = Thread[Range[49, 56] -> {{"1C", "2D"}, {"1A", "2B"}, {"1B", "2A"}, {"1D", "2C"}, {"1E", "2F"}, {"1G", "2H"}, {"1F", "2E"}, {"1H", "2G"}}];

quarters = Thread[Range[57, 60] -> {{"W49", "W50"}, {"W53", "W54"}, {"W55", "W56"}, {"W51", "W52"}}];

semifinals = {61 -> {"W57", "W58"}, 62 -> {"W59", "W60"}};

thirdplace = {63 -> {"L61", "L62"}};

final = {64 -> {"W61", "W62"}}


With all this we could define the competition as a succesive group of rounds:

competition = {roundOf16, quarters, semifinals, Join[thirdplace, final]};


## 4.2. Playing the phase

Playing a round means doing a match and returning the results, to make it easier, the results will accumulate, so that we can then use Fold to have the total results.

playRound[round_, previousWL_] := Join[
previousWL,
Association@Flatten@Map[
{"L" <> ToString[#[[1]]], "W" <> ToString[#[[1]]]} ->
#[[2]][[Ordering[simulatedMatchResults[match["Knockout"] @@ #[[2]]]]]]
] &,
round /. previousWL
]
];

playRounds[groupsResults_] := Fold[playRound[#2, #1] &, groupsResults, competition]


## 4.3. First basic stats

With all the results, the first stats I could think about were in which position each team finished. I only cared about the teams that actually reached at least the second phase. Of course if we run this enough times, all the teams will get into the second phase at least once... I guess.

First, let's define a function that will return the team position depending on the match they lost (or won, for third and first place):

posByLost = Flatten[Thread /@ {
("L" <> ToString[#] & /@ roundOf16[[All, 1]]) -> 9,
("L" <> ToString[#] & /@ quarters[[All, 1]]) -> 5,
"L63" -> 4, "W63" -> 3, "L64" -> 2, "W64" -> 1, _ -> None
}]

(posByLostFun[#1] = #2) & @@@ posByLost;


If we apply that function to the result of playRounds we can get which position each team got:

getPosition[globalResults_] := Reverse[Sort[DeleteCases[KeyValueMap[posByLostFun[#1] -> #2 &, globalResults], None -> _]], {2}]


And if we have multiple resutls of getPosition we would need to do a summary:

summaryPositions[positions_] := SortBy[
With[{n = Length[positions]},
GroupBy[Join @@ positions, First -> Last, Sort[(#1 -> N[#2/n] & @@@ Tally[#])] &]
],
-Accumulate[Lookup[#, {1, 2, 3, 4, 5, 9}, 0]] &
];


But, given that it is more common to talk about the winner, finalist, semifinalist, etc, let's make a function to compute that exactly, using as input the previous result:

accumulatePositions[pos_] := Association@
ReleaseHold[
{
"Winner" -> Hold[1],
"Finalist" -> Hold[1] + Hold[2],
"Semifinalist" -> Hold[1] + Hold[2] + Hold[3] + Hold[4],
"Quarters" -> Hold[1] + Hold[2] + Hold[3] + Hold[4] + Hold[5],
"Round16" -> Hold[1] + Hold[2] + Hold[3] + Hold[4] + Hold[5] + Hold[9]
} /. Append[pos, _Integer -> 0]
];


## 4.4. Other ideas for more stats

Some very custom stats could be done too, like for example, getting the most probably final, or the against which teams will play your favorite team.

# 5. All together

Now we have all the tools needed to run the world cup n times:

simulateWC[n_] := accumulatedPositions[
summaryPositions[
Table[
getPosition[
playRounds[
selectAllClassified[
simulateAllGroups[
teamsByGroup
]
]
]
],
n
]
]
];


And we need to have a nice display of it:

displaySimulationResult[sr_] := (Dataset\$DatasetTargetRowCount = 40; Dataset[sr /. x_?NumericQ :> Quantity[100 x, "Percent"]])


(I know, and I always say it, Dataset is for much more than just displaying tables, but they look great).

## 5.1. Using it!

Now we can play with our functions, and do simulations of 10000 or 100000 or even more if you have memory and time, and get the probabilities by just counting.

Some results:

simulation100k = simulateWC[100000];


(This took 473 seconds in my computer, i7-4710HQ ).

And now we can display them:

displaySimulationResult[simulation100k]


### 5.1.1. Why not a map too?

Let's make a nice map of the winners probability and about reaching the second phase:

Labeled[GeoRegionValuePlot[Quantity[100 #["Winner"], "Percent"] & /@ simulation100k,
GeoBackground -> "CountryBorders", GeoProjection -> "Robinson", ImageSize -> 1200,
PlotLabel -> Style["Probability of winning the 2018 FIFA World Cup Russia", 24]],
Text["Using 100000 simulations based on FIFA ranking from June 7, 2018"]
]


Labeled[GeoRegionValuePlot[Quantity[100 #["Round16"], "Percent"] & /@ simulation100k,
GeoBackground -> "CountryBorders", GeoProjection -> "Robinson", ImageSize -> 1200,
PlotLabel -> Style["Probability of getting into second phase in the 2018 FIFA World Cup Russia", 24]],
Text["Using 100000 simulations based on FIFA ranking from June 7, 2018"]
]


## 5.2. Desired final!

What about computing the probability of your desired final? It shouldn't be that hard given all this, it would be a matter of getting the joint probability of your teams being in first or second position simultaneously. Let's say we want a Peru-Brazil final -- and of course, expecting that Peru wins :).

We could try simply 10000 times this time:

Block[{n = 10000, c = 0},
Do[
c += Boole[MatchQ[Sort[{"W64", "L64"} /. playRounds[selectAllClassified[simulateAllGroups[teamsByGroup]]]], {Entity["Country", "Brazil"], Entity["Country", "Peru"]}]];
, n
];
Quantity[100.0 c/n, "Percent"]
]


And after doing that I got 0.85%, not as good as we would like.

# Conclusions

1. This is going to be fun!
2. This simulation is based on a given ranking, but it is easy to change it, according to different expectations and factors.
3. As a pending task remains the addition of a meaningful extra factor for each team (or even eath team pair).
Attachments:
5 days ago
6 Replies
 Vitaliy Kaurov 4 Votes This is great! @Francisco thank you for sharing! I wonder how your predictions compare to this recent article:Prediction of the FIFA World Cup 2018 – A random forest approach with an emphasis on estimated team ability parameters https://arxiv.org/abs/1806.03208which was also covered in MIT Technology Review blog: Machine learning predicts World Cup winner. There basic result is the following tree of wins:
4 days ago
 - Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile! Thank you, keep it coming!
4 days ago

Great post @Francisco thank you for sharing! Perhaps we can get a more accurate prediction if we include info about each player. Fifa provides some nice info about the 700+ players that will participate in the World Cup https://www.fifa.com/worldcup/players/ . So I created the following dataset (see it attached):

datasetWC18 = Import["WC18PlayersDataset.m"]


Age of the players might be a relevant feature. But the most relevant info is still missing .... players market value and achievements within their clubs (there are several places where this info is available, but I haven't had time to include it).

Anyway let's do some nice visualization with this players dataset as a start. For example I was curious about the correlation between "Height" / "Weight" and their "Field Position". The results are the following (Note that I'm using ListPointPlot3D to associate a 3rd dimension to players indicating their "Field Position") :

fieldPositions = {"GK", "DF", "MF", "FW"};

ListPointPlot3D[
Table[
Map[Append[#, "PositionN" -> i] &,
datasetWC18[Select[#Position == fieldPositions[[i]] &], {"Height", "Weight"}]
]
, {i, 4}],
ImageSize -> 800,
PlotStyle -> PointSize[0.012],
PlotLegends -> SwatchLegend@{Style["Goalkeepers", 18], Style["Defenders", 18],   Style["Midfielders", 18], Style["Forwards", 18]},
AxesLabel -> {Style["cm", 18, Bold], Style["kg", 18, Bold]},
PlotLabel -> Style["Players height vs weight", 20, Bold]]


As expected the Goalkeepers on average are the tallest and heaviest. What I wasn't expecting was that on average the Midfielders are the smallest. Perhaps it is because Midfielders best asset is their strategic mind to orchestrate the game from the middle with clever ball passes.

## Projecting the flags of World Cup countries

Borrowing Francisco's code for countries entities one can easily get all their flags, except the one from England which is missing (so, I added it manually):

countries = {Entity["Country", "Russia"],
Entity["Country", "SaudiArabia"], Entity["Country", "Egypt"],
Entity["Country", "Uruguay"], Entity["Country", "Portugal"],
Entity["Country", "Spain"], Entity["Country", "Morocco"],
Entity["Country", "Iran"], Entity["Country", "France"],
Entity["Country", "Australia"], Entity["Country", "Peru"],
Entity["Country", "Denmark"], Entity["Country", "Argentina"],
Entity["Country", "Iceland"], Entity["Country", "Croatia"],
Entity["Country", "Nigeria"], Entity["Country", "Brazil"],
Entity["Country", "Switzerland"], Entity["Country", "CostaRica"],
Entity["Country", "Serbia"], Entity["Country", "Germany"],
Entity["Country", "Mexico"], Entity["Country", "Sweden"],
Entity["Country", "SouthKorea"], Entity["Country", "Belgium"],
Entity["Country", "Panama"], Entity["Country", "Tunisia"],
Entity["Country", "Poland"], Entity["Country", "Senegal"],
Entity["Country", "Colombia"], Entity["Country", "Japan"]};

flags = EntityValue[countries, "Flag"];


flagsAll = (flags /. {x_Missing -> englandFlag})


Finally with GeoGraphics we can create an interactive map using Tooltip as follows:

GeoGraphics[
{EdgeForm[ Directive[Thin, Black]],
{GeoStyling[{"Image", #2}], Tooltip[Polygon[#1], CommonName[#1]]} & @@@ Transpose[{countries, flagsAll}]},
Background -> "CountryBorders",
GeoProjection -> "Robinson", ImageSize -> 1200,
PlotLabel ->  Style["Countries in the 2018 FIFA World Cup Russia", 24]]


## Population of countries in the 2018 FIFA World Cup

In my opinion two other interesting features to look for in the participating countries are their Population and GDP. Which can be easily obtained using EntityValue :

valsPopulation = EntityValue[countries, "Population"];
GeoBackground -> "CountryBorders", GeoProjection -> "Robinson",
ImageSize -> 1200, PlotLabel ->  Style["Population of countries in the 2018 FIFA World Cup Russia",  24]]


## GDP of countries in the 2018 FIFA World Cup

valsGDP = EntityValue[countries, "GDP"];

GeoBackground -> "CountryBorders", GeoProjection -> "Robinson",
ImageSize -> 1200, PlotLabel -> Style["GDP of countries in the 2018 FIFA World Cup Russia", 24]]
`

Attachments: