Group Abstract Group Abstract

Message Boards Message Boards

[WSSA16] Predicting Winning Odds of La Liga Matches

Attachments:
POSTED BY: Mher Davtyan
6 Replies
Posted 8 years ago

Nice post indeed, Mher!

I have studied your code a while and I have some comments.

Sorry if I misunderstood your code but here it goes...

As I ran your code I suspected the H2H calculation was not according the description you've provided.

H2H calculations should only take into account the previous games as one should expected and you said as well.

Your code gives us an single H2H value for every two teams confrontation.

That means H2H would be a wonderful predictor indeed since it contains information about all future confrontations of any two teams!

For a single game H2H calculation didn't take the date as a reference.

So the calculation takes all team1 against team2 and team2 against team1 games...

66% accuracy for classifier drops to something around 57%, as I have made the following changes:

headToHeadTotal[t1_, t2_, dt_] :=
 N[Mean[
   Join[
    Normal@
     outcomeAdded[
      Select[#Team1 == t1 && #Team2 == t2 && #MatchDate < dt &], 
      "Outcome"], -Normal@
      outcomeAdded[
       Select[#Team2 == t1 && #Team1 == t2 && #MatchDate < dt &], 
       "Outcome"]
    ]
   ]]

And:

h2hAdded = 
  outcomeAdded[All, 
   Append[#, 
     "H2H" -> (headToHeadTotal[#Team1, #Team2, #MatchDate])] &];

So the result is:

enter image description here

Would you please confirm my interpretation?

I have not built the predict of outcome yet so I do not have final numbers to show...

POSTED BY: Edson Ferreira

Thank you for your comments Edson.

You are absolutely right in what you say. However, the absence of date as reference in calculating head to head statistics has several reasons.

Firstly, as you can see in the description of 'H2H' statistics, the low statistics are filtered. This shows that the variable 'H2H' variable initially was not intended to change over time, rather I wanted to have a general indicator of head to head clashes over some relative time period. Secondly, adding date as a reference has a big drawback. Actually it does not do what it is supposed to do. It does not show the 'past' performance in absolute manner (as our data does not contain all the historical data), but shows the 'past' performance of two particular teams only relative to the time period in the dataset that we give to the classifier.

And I think the latter is the reason why adding the date as reference decreases the accuracy of the model. So, the following sentence in the text:

''It [H2H] basically illustrates the advantage of Team1 over Team2 during their previous matches.''

means the previous matches before the match that is being predicted. (I agree it is not well formulated).

POSTED BY: Mher Davtyan

Please see a followup discussion here: http://community.wolfram.com/groups/-/m/t/1218215

Predicting Winning Odds of Italian Serie A Soccer 1934-2017

POSTED BY: Tanel Telliskivi
Posted 8 years ago

Nice post! I've read here that you used Scrapy to get information. Have you tried anyother scraping tools? I've only tried price scraping so far on http://www.tellprices.com/ and I was impressed.. Thiught maybe you could give me some advice on what other scraping tools you find effective. Thanks!

POSTED BY: Elena At

Thank you very much! Yes, I've used Python framework - Scrapy for obtaining the data for this project. It can be used for both small projects and also for building complex spiders. There are a lot of tools for web-crawling, but the ones that I find effective besides Scrapy, are Beautiful Soup, which is easy python library for pulling data from HTML/XML pages, and Selenium WebDriver, which provides Python API to use its powerful functionality. It was originally written for testing web pages, but is widely used for data crawling, especially in those cases where there are complex captchas and other barriers of accessing data.

POSTED BY: Mher Davtyan

enter image description here - you earned "Featured Contributor" badge, congratulations !

This is a great post and it has been selected for the curated Staff Picks group. Your profile is now distinguished by a "Featured Contributor" badge and displayed on the "Featured Contributor" board.

POSTED BY: EDITORIAL BOARD
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard