Group Abstract

Message Boards

WOLFRAM COMMUNITY

26.6K Views

6 Replies

11 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Staff Picks Data Science Recreation Social Science Curated Data Import and Export Wolfram Language Statistics and Probability Machine Learning Wolfram Summer School

[WSSA16] Predicting Winning Odds of La Liga Matches

Mher Davtyan

Mher Davtyan, Yerevan State University

Posted 10 years ago

Attachments:

POSTED BY: Mher Davtyan

6 Replies

Sort By:

Edson Ferreira

Posted 8 years ago

Nice post indeed, Mher! I have studied your code a while and I have some comments. Sorry if I misunderstood your code but here it goes... As I ran your code I suspected the H2H calculation was not according the description you've provided. H2H calculations should only take into account the previous games as one should expected and you said as well. Your code gives us an single H2H value for every two teams confrontation. That means H2H would be a wonderful predictor indeed since it contains information about all future confrontations of any two teams! For a single game H2H calculation didn't take the date as a reference. So the calculation takes all team1 against team2 and team2 against team1 games... 66% accuracy for classifier drops to something around 57%, as I have made the following changes: headToHeadTotal[t1_, t2_, dt_] := N[Mean[ Join[ Normal@ outcomeAdded[ Select[#Team1 == t1 && #Team2 == t2 && #MatchDate < dt &], "Outcome"], -Normal@ outcomeAdded[ Select[#Team2 == t1 && #Team1 == t2 && #MatchDate < dt &], "Outcome"] ] ]] And: h2hAdded = outcomeAdded[All, Append[#, "H2H" -> (headToHeadTotal[#Team1, #Team2, #MatchDate])] &]; So the result is: Would you please confirm my interpretation? I have not built the predict of outcome yet so I do not have final numbers to show...

POSTED BY: Edson Ferreira

Mher Davtyan

Mher Davtyan, Yerevan State University

Posted 8 years ago

Thank you for your comments Edson. You are absolutely right in what you say. However, the absence of date as reference in calculating head to head statistics has several reasons. Firstly, as you can see in the description of 'H2H' statistics, the low statistics are filtered. This shows that the variable 'H2H' variable initially was not intended to change over time, rather I wanted to have a general indicator of head to head clashes over some relative time period. Secondly, adding date as a reference has a big drawback. Actually it does not do what it is supposed to do. It does not show the 'past' performance in absolute manner (as our data does not contain all the historical data), but shows the 'past' performance of two particular teams only relative to the time period in the dataset that we give to the classifier. And I think the latter is the reason why adding the date as reference decreases the accuracy of the model. So, the following sentence in the text: ''It [H2H] basically illustrates the advantage of Team1 over Team2 during their previous matches.'' means the previous matches before the match that is being predicted. (I agree it is not well formulated).

POSTED BY: Mher Davtyan

Tanel Telliskivi

Tanel Telliskivi, Classical Mechanics

Posted 8 years ago

Please see a followup discussion here: http://community.wolfram.com/groups/-/m/t/1218215 *Predicting Winning Odds of Italian Serie A Soccer 1934-2017*

POSTED BY: Tanel Telliskivi

Elena At

Posted 9 years ago

Nice post! I've read here that you used Scrapy to get information. Have you tried anyother scraping tools? I've only tried price scraping so far on http://www.tellprices.com/ and I was impressed.. Thiught maybe you could give me some advice on what other scraping tools you find effective. Thanks!

POSTED BY: Elena At

Mher Davtyan

Mher Davtyan, Yerevan State University

Posted 9 years ago

Thank you very much! Yes, I've used Python framework - Scrapy for obtaining the data for this project. It can be used for both small projects and also for building complex spiders. There are a lot of tools for web-crawling, but the ones that I find effective besides Scrapy, are Beautiful Soup, which is easy python library for pulling data from HTML/XML pages, and Selenium WebDriver, which provides Python API to use its powerful functionality. It was originally written for testing web pages, but is widely used for data crawling, especially in those cases where there are complex captchas and other barriers of accessing data.

POSTED BY: Mher Davtyan

EDITORIAL BOARD

EDITORIAL BOARD, WOLFRAM

Posted 10 years ago

- you earned "Featured Contributor" badge, congratulations ! This is a great post and it has been selected for the curated Staff Picks group. Your profile is now distinguished by a "Featured Contributor" badge and displayed on the "Featured Contributor" board.

POSTED BY: EDITORIAL BOARD

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback