Group Abstract Group Abstract

Message Boards Message Boards

[WSS19] Visualize World's Oldest People Data

Posted 6 years ago

enter image description here

enter image description here

This is a basic visualization of World's Oldest People: http://archive.is/4kwbk

Importing the data either by the link above directly or by downloading the html data file directly results into two slightly different data structures. I downloaded the data file and imported it into Mathematica (attached below). The following code shows how to import the data and perform basic cleaning:

SetDirectory[NotebookDirectory[]];
data = Import["agedata.html", "Data"];
data = data[[1]];
info = data[[1 ;; 7]];
data = data[[8 ;;]];
data = data[[1 ;; 65]];

The data is a list of lists, each list corresponds to a person. The heads of the data are:

In[11]:= info[[6]]

Out[11]= {"#", "Birthplace", "Name", "Born", "Died", "Age", "Race", \
"Sex", "Deathplace", "When Oldest", "Length of Reign", "Reign \
Length", "Age at Accession", "Case added to GRG Tables"}

Here there is a typical list corresponding to a person:

In[38]:= data[[2]]

Out[38]= {2, "England (UK)", "Jennie Howell", "Feb. 11, 1845", "Dec. \
16, 1956", 111, 309, "W", "F", "U.S. (CA)", "1955-1956", "110-111", \
1, 53, " 1.14", 110, 255, "Aug. 29, 2017"}

I decided to use only "Birthplace", "Age" and "Race" parameters. Since age is presented by years and days in two different elements of the list, I divided the days by 365 to convert them into a year fraction and added it to the year element.

age = (data[[#, 6]] + data[[#, 7]]/365) & /@ Range[Length@data];

After reviewing the values of my age list, I found out that the last person in the data has extra notations and highlights which caused miscalculations:

In[14]:= data[[65]]

Out[14]= {65, "Japan", "Kane Tanaka", "Jan. 2, 1903", "115*", "214*", \
"EA", "F", "Japan (Fukuoka)", "2018-", "115-", 0, 9, " 0.02* ", 115, \
201, "Mar. 18, 2014"}

So I modified the last element of the age list:

age = N@ReplacePart[age, 65 -> 115 + 214/365]

Creating the race list and cleaning it:

race = data[[#, 8]] & /@ Range[Length@data]
race = ReplacePart[race, {58 -> "M", 65 -> "EA"}];
race = race /. "O" -> "EA";

Note that the data marked Japan as a stand alone race, so I changed it to join the East Asia race. Now, creating and cleaning the data for the birth place list:

birthplace = data[[#, 2]] & /@ Range[Length@data]
birthplace = StringTrim[birthplace];

birthplace = 
  ReplacePart[
   birthplace, {27 -> "Poland", 38 -> "Portugal", 62 -> "Jamaica"}];
birthplace = 
  StringReplace[birthplace, ___ ~~ "(UK)" ~~ ___ -> "United Kingdom"];

birthplace = 
  StringReplace[birthplace, ___ ~~ "U." ~~ ___ -> "United States"];

I intended to use the GeoGraghics capabilities of Mathematica which needs the countries' names to match the ones in Wolfram data repository, so I looked for the ones that didn't match and modified them:

In[24]:= Position[
 MemberQ[CountryData["Countries", "Name"], #] & /@ birthplace, False]

Out[24]= {{13}, {32}}

In[25]:= Extract[birthplace, 
 Position[MemberQ[CountryData["Countries", "Name"], #] & /@ 
   birthplace, False]]

Out[25]= {"Czechoslovakia", "Canada (Que)"}

In[26]:= birthplace = 
  ReplacePart[birthplace, {13 -> "Czech Republic", 32 -> "Canada"}];

Now, lets compute the data for average age per country:

joint = Transpose[{birthplace, age}];

geoData=
Thread[Entity["Country", #] & /@ 
DeleteDuplicates[
StringDelete[#, " "] & /@ 
birthplace] -> (Mean@Cases[joint, {#, _}][[All, 2]] & /@ 
DeleteDuplicates[birthplace])]

enter image description here

And compute the geo-range for our map:

In[]:= geoRange=GeoBounds[geoData[[All,1]]]
Out[]= {{-34.8341,83.1333},{-230.447,32.8917}}

Now, here it is a map that shows the countries of the 65 oldest people in the world with colour scale showing the average age per country. You can see the result at the top of this post.

GeoRegionValuePlot[geoData,GeoBackground->"CountryBorders",
GeoProjection->"WinkelTripel",GeoRange->geoRange,ImageSize->800,PlotLabel->
Style["Average Years per Country of World's Oldest Person",25,Darker@Red,FontFamily->"Phosphate"]]

To better comprehend the value per country we can plot the average age-above-105 using BarChart:

BarChart[Sort[Association[geoData]]-105,
BarOrigin->Left,ChartLabels->Automatic,AspectRatio->1,
ChartElementFunction->"GradientScaleRectangle",BaseStyle->14,
PlotTheme->"Detailed",PlotLabel->Style["Average Years Above 105",20,Darker@Red]]

enter image description here

And here there is a bar chart showing the number of oldest people per each race:

In[32]:= Counts@race

Out[32]= <|"W" -> 45, "EA" -> 11, "B" -> 7, "H" -> 1, "M" -> 1|>

In[34]:= BarChart[Counts@race, 
 ChartLabels -> {"White", "East Asian", "Black", "Hispanic", 
   "Multiracial"}, LabelingFunction -> Above]

A bar chart of number of the oldest people in each race

I stopped at this point, but there are many ways to visualize this dataset and even the two graphs here can be polished to be more informative. A way to handle the data effectively is to collect all the variables (age, race,...etc) in a matrix which will make data manipulation much easier. Tips on this dataset cleaning, preparation and visualization are much appreciated.

Attachments:
POSTED BY: Ahmed Elbanna
5 Replies

Amazing. With some tuning, you can even make it shorter, as you started the code arranging the data as a matrix which I wished I had time to do it. This can handle several variables in at same time like the way I did it in the GeoRegionValuePlot line. Thank you for sharing.

POSTED BY: Ahmed Elbanna
POSTED BY: Julian Davis

Very interesting post, thank you.

POSTED BY: John Shonder

Thanks :)

POSTED BY: Ahmed Elbanna

enter image description here - Congratulations! This post is now featured in our Staff Pick column as distinguished by a badge on your profile of a Featured Contributor! Thank you, keep it coming, and consider contributing your work to the The Notebook Archive!

POSTED BY: EDITORIAL BOARD
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard