Message Boards Message Boards

0
|
6461 Views
|
4 Replies
|
6 Total Likes
View groups...
Share
Share this post:

Rublev rankings history (ATP tennis player) — How to import/plot?

How to import listed data from a webpage?

Since I enjoy watching professional men's tennis (ATP Tour), let's take this webpage as example, the http or https URL is:

https://www.atptour.com/en/players/andrey-rublev/re44/rankings-history

My task seems simple enough and straight-forward: import the rankings data to Mathematica and plot them! So how would you do it? One idea is to:

  1. import the entire HTML webpage as one single string,
  2. then use string pattern matching to try to extract the data strings,
  3. convert the data strings to the optimal data type (date stamp vs ranking),
  4. finally plot the two-dimensional data, maybe as date plot.

Do you have any proposed working code for this problem, could it be solved within 3-6 lines of code? Please could you share, thanks in advance!

POSTED BY: Raspi Rascal
4 Replies

The acceptable URL has changed from "https" to "http". I finally understood @Rohit's code and broke it down to more steps, showing an easier/less elegant alternative to the MapAt approach.

page = Import[
   "http://www.atptour.com/en/players/andrey-rublev/re44/rankings-history"
   , {"HTML", "Data"}
  ];
data = Cases[page, {{"Date", __}, _}, Infinity][[1, 2]];
datedata = Map[{DateObject[#[[1]]], #[[2]]} &, data];
cleaned = (datedata /. {q_, p_String} :> {q, ToExpression@p/T});(*a funny way of eliminating T lol*)
DateListPlot[cleaned, ScalingFunctions -> "Reverse", PlotLabel -> Style["Andrey Rublev - Singles", 14, Bold]]

enter image description here

Thanks again to you guys for the help! This was/is my first RL attempt at extracting data from an ordinary modern-looking HTML webpage, so i am learning a lot about the power of the Import function and the power of the Date functions. On a side note: Rublyoff is imho the most genuine and likable among the younger gen tennis players, nothing fake or superficial about him. Think of raspi me when you stumble across his name in future in the news, conversations/talks, or elsewhere!

POSTED BY: Raspi Rascal

@Gustavo Delfino Thanks so much for your proposed advanced solution! I can see that you're making use of the XML structure of the webpage, even though the page source code doesn't specify that it's an XML page. Also your code makes use of Associations, a construct which i never saw put into practice. I don't really know anything about the structure of webpage source code, which becomes apparent through my idea of working with strings haha. Anyway, your solution gives me much to think, much to learn and revisit (e.g. book chapter on Associations). Thanks again for all of it!

@Rohit Namjoshi Thanks also to you for your proposed solution! It's been exactly 1.0 year since i last put my hands on Mma coding and it seems that i need to relearn most of it to get back to where i had left off. Your code looks more within my intellectual reach, even though i don't fully get it at this point. But that's good. You guys' lovely codes and this successful webpage application example motivate me to get back into Mma coding sooner than later, so I am very grateful for your inspiring contributions! Btw i am not sure what the trailing "T" means, but i should be able to strip it off, yes.

If anyone else has further code alternatives to solve the problem —no matter if very similar in approach—, you're very welcome to share too. Imho the more code alternatives the better, thank you in advance!

POSTED BY: Raspi Rascal
Posted 4 years ago

Hi Raspi,

This is one way to do it along the lines you proposed.

page = Import[
   "https://www.atptour.com/en/players/andrey-rublev/re44/rankings-history", {"HTML", "Data"}];
data = Cases[page, {{"Date", __}, _}, Infinity][[1, 2]] // MapAt[DateObject, #, {All, 1}] &;
data // Part[#, All, 1 ;; 2] & // 
 DateListPlot[#, PlotLabel -> Style["Andrey Rublev - Singles", 14, Bold]] &

The data is not always numeric, it sometimes has a trailing "T", I guess that means "Tied"? Those are ignored by DateListPlot. Strip off the "T" to plot them.

enter image description here

POSTED BY: Rohit Namjoshi

Line 1: Import the website HTML as an XMLObject

xml = Import["https://www.atptour.com/en/players/andrey-rublev/re44/rankings-history",
             {"HTML", "XMLObject"}]

Line 2: Process the XML and store it in variables named "singles" and "doubles"

{singles,doubles}=Cases[xml,XMLElement["tr",{},{XMLElement["td",_,{date_String}],XMLElement["td",_,{singles_String}],XMLElement["td",_,{doubles_String}]}]:><|"date"->DateObject@date,"singles"->ToExpression@singles,"doubles"->ToExpression@doubles|>,Infinity]//
{Query[All,{"date","singles"}][#],Query[All,{"date","doubles"}][#]}&

Line 3: Plot

DateListPlot[Values /@ {singles, doubles}, PlotLegends -> {"singles", "doubles"}]

singles and doubles plot

POSTED BY: Gustavo Delfino
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract