Group Abstract Group Abstract

Message Boards Message Boards

3
|
12K Views
|
15 Replies
|
16 Total Likes
View groups...
Share
Share this post:

Scraping the Rio2016.com Website for Summer 2016 Olympics Medal Results

Hi all,

I had an idea that I wanted to use the 2016 Summer Olympics medal results by event as an example for a pivot table in Excel. This is for a class that I teach, so it ultimately has to end up in Excel. Naturally, I made an attempt to scrape the data using Excel itself, but I ran into problems with that. So, I figured that I would use Mathematica to grab the data instead, partly as a learning exercise for myself.

Well, that turned out to be a bit more involved than I thought it would be. I ended up having to do a few different steps, and making heavy use of Part in order to massage the data into the form that I wanted. I often find Part to be confusing with nested lists, so this was a useful exercise. What I wanted was a list such as:

Summer 2016 Olympics Results

This list could then be exported to Excel for use in my pivot table. I have attached my notebook file for anyone to look at or use, if interested. I'm also open to (hoping for) some alternative ideas for how to accomplish the same result in an easier way.

EDIT: I found an error with the country association, so I uploaded a fixed version of the notebook.

Thanks,

Tim

Attachments:
POSTED BY: Tim Mayes
15 Replies

It just came out 5-6 days ago. Most shocking was that for gene-papers in Nature 30% has some issue with Excel. I was horrified!

POSTED BY: Sander Huisman

Hi Sander,

what a nice reference. I very much agree that in many areas, like biology, they do tend to use Excel. I do work at the interface with other areas of science and have experienced similar issues a lot.

Thanks a lot for that reference. I'll make sure to use it...

Cheers,

Marco

POSTED BY: Marco Thiel

David,

I've got that (Mens 100m Butterfly). :-) Also, I have double checked my medal counts by country against the Rio2016 site and they match, so I'm confident that I have the results presented accurately.

Tim

POSTED BY: Tim Mayes
Posted 9 years ago

Make sure you audit your results carefully. It would be unfortunate for people to not know of Singapore's first ever Gold medal this year!

POSTED BY: David Proffer

Marco,

Thanks, that is very helpful!

Tim

POSTED BY: Tim Mayes

Indeed, but still a lot of people in biology do so:

http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7

POSTED BY: Sander Huisman

Hi Tim,

would something like this work for you?

As before

jointTable = 
  Flatten /@ 
     Transpose[{tableresults[[All, 1 ;; 4]], 
     ConstantArray["Gold", Length[tableresults]], 
     tableresults[[All, 5 ;; 6]], 
     ConstantArray["Silver", Length[tableresults]], 
     tableresults[[All, 7 ;; 8]], 
     ConstantArray["Bronze", Length[tableresults]]}];

This table contains all the data. You now only need to pick the columns you need and join them together:

Join[jointTable[[All, 1 ;; 5]], jointTable[[All, {1, 2, 6, 7, 8}]], jointTable[[All, {1, 2, 9, 10, 11}]]] // TableForm

You can, of course, sort this now according to your favourite criterion. If you call

singleTable=Join[jointTable[[All, 1 ;; 5]], jointTable[[All, {1, 2, 6, 7, 8}]], jointTable[[All, {1, 2, 9, 10, 11}]]]

Then

SortBy[singleTable, First] // TableForm

or

SortBy[singleTable, #[[3]] &] // TableForm

or

SortBy[singleTable, #[[4]] &] // TableForm

might give you representation that you prefer.

Cheers,

Marco

POSTED BY: Marco Thiel
Posted 9 years ago

Tim, Good hunting on a solution that works to your goal!

For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html

POSTED BY: David Proffer
Posted 9 years ago
POSTED BY: David Proffer

This is a brilliant response, @Marco Thiel, and I second that "Excel should not be on the computer of a scientist/data scientist".

POSTED BY: Vitaliy Kaurov

Marco,

Thanks. Your solution is the kind of thing that I was looking for, though the final table doesn't match the format that I need exactly (I need each medal on its own row). Your repeated use of Map is exactly the kind of thing that I need to practice on, so thanks for showing that solution.

POSTED BY: Tim Mayes
POSTED BY: Tim Mayes

As I always say: "Excel should not be on the computer of a scientist/data scientist". It seems to do some kind of job for people working in administration though - I am told.

Here a slightly different way of cleaning the data in Mathematica.

(*Import*)
dataimport = Import["https://www.rio2016.com/en/medal-count-sports", "Data"];
(*Generate first table*)
tableresults = 
  Select[Map[Flatten, 
 Transpose[{Flatten[ConstantArray[#[[1]], #[[2]]] & /@ 
 Transpose[{dataimport[[4, 2]][[All, 1]], Length /@ dataimport[[4, 2]][[All, 2]]}]], Flatten[dataimport[[4, 2]][[All, 2]], 1]}]], Length[#] == 8 &];
(*Generate table with type of medal*)
TableForm[
 Flatten /@ Transpose[{tableresults[[All, 1 ;; 4]], ConstantArray["Gold", Length[tableresults]], tableresults[[All, 5 ;; 6]], 
 ConstantArray["Silver", Length[tableresults]], tableresults[[All, 7 ;; 8]], ConstantArray["Bronze", Length[tableresults]]}]]

To me that seems to be much shorter than the endless list of Excel instructions.

Cheers,

M.

POSTED BY: Marco Thiel
Posted 9 years ago
Attachments:
POSTED BY: David Proffer
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard