Group Abstract

Message Boards

WOLFRAM COMMUNITY

7.2K Views

1 Reply

3 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

[WSS19] Analyzing and visualizing data about the world's oldest people

Emmy Blumenthal

Emmy Blumenthal, Boston University

Posted 6 years ago

Data about the oldest humans that have lived allow a small data set to reveal larger patterns about human life expectancy and general quality of life. The data used in this brief analysis can be found at http://archive.is/4kwbk.The first step in my data analysis was to extract data and reformat it. First, I import the data and define the headers of the dataset. orig = Import["http://archive.is/4kwbk", "Data"] heads = {"#", "Birthplace", "Name", "Born", "Died", "Age", "Days", "Race", "Sex", "Deathplace"} Next, I convert the data to the `Dataset` form. dataSimplified = Dataset[Association[Thread[heads -> #[[;; 10]]]] & /@ Take[orig[[2, 1, 1]], {8, 72}]] The data is relatively disorganized and complex, so many substitutions were necessary so that the `Interpreter` function was able to understand the data. dataCorrected = dataSimplified[ All, {"Birthplace" -> StringReplace[{"U.S." ~~ ___ -> "USA", ___ ~~ "(UK)" -> "UK", "Czechoslovakia" -> "Czech Republic", "Germany (now Poland)" ~~ ___ -> "Poland", "Canada (Que)" -> "Canada", "Cape Verde (Portugal) [8]" -> "Portugal", ___ ~~ "Jamaica)" -> "Jamaica"}], "Born" -> StringReplace[{"[1]" -> ""}]}] The last line of the data is offset as Kane Tanaka is the current oldest person alive. In[ ] = Normal[dataCorrected1[65, All]] Out[ ] = <\|"#" -> 65, "Birthplace" -> "Japan", "Name" -> "Kane Tanaka", "Born" -> "Jan. 2, 1903", "Died" -> "115", "Age" -> "214", "Days" -> "EA", "Race" -> "F", "Sex" -> "Japan (Fukuoka)", "Deathplace" -> "2018-"\|> We'll have to manually edit this data. row65 = Dataset[<\|"#" -> 65, "Birthplace" -> "Japan", "Name" -> "Kane Tanaka", "Born" -> "Jan. 2, 1903", "Died" -> "n/a", "Age" -> "n/a", "Days" -> "n/a", "Race" -> "EA", "Sex" -> "F"\|>] dataCorrected2 = Append[Drop[dataCorrected1, -1], row65] Using the `Interpreter` function, I was able to convert birthplaces to a format that is easily computable. dataInterpreted = dataCorrected2[ All, {"Birthplace" -> Interpreter["Country"], "Born" -> Interpreter["Date"], "Died" -> Interpreter["Date"]}] I then used the `GeoHistogram` command to create the following figure. GeoHistogram[ Normal[dataInterpreted[All, "Birthplace"]], {"Rectangle", 15}, PlotTheme -> "Scientific"] The figure displays the density of the worlds oldest people in different regions. Analysis: This figure supports the hypothesis of the Global South. This controversial hypothesis that is often said to overgeneralize suggests that individuals living South of the equator are more likely to have a lower quality of life than those living North of the equator. This figure supports the hypothesis as better quality of life is associated with longer life expectancy, and the figure suggests that those who reign as the oldest humans are more likely to be found North of the equator. We can also observe similar disparities across races. The following figure was generated using the `PieChart` function. racecounts = Counts[Normal[dataInterpreted[All, "Race"]]] PieChart[Values[racecounts], ChartLabels -> {"White", "Japanese", "Black", "Hispanic", "East Asian", "Multiracial"}] This figure displays the proportion of those who have reigned as the oldest human on earth by race. Note Japanese is separated as a different race in accordance with the original data. Analysis: This figure displays that a large majority of those who have reigned as oldest person are white. This data is reflective of general, global racial inequity. Finally, we can look at some statistics of this elite group. The distribution of their ages can be generated using the `Histogram` function. lifeLengthData = Drop[Quantity[Normal[dataInterpreted[All, "Age"]], "Years"] + Quantity[Normal[dataInterpreted[All, "Days"]], "Days"], -1] Histogram[lifeLengthData, {600}, Frame -> True, FrameLabel -> {"Age (days)", "Frequency"}] Note the last value in the is dropped as Kane Tanaka is still living. Analysis: What is particularly interesting about this data is the relative symmetry it exhibits. Using the `Skewness` function, I found that the skewness of this distribution is only `.174`, relatively low. N@Skewness[lifeLengthData] This is unexpected as age distributions typically have strong positive skew. If we interpret this data as a sample from a theoretical distribution, we can conduct inference tests. The inference test of interest would be the Mardia skewness test for normality. This test allows us to assess whether the skewness observed is significantly different than the skewness measurements produced by random sampling of the normal distribution. The Wolfram Language is able to run this test with the `MardiaSkewnessTest`. Testing the data returns a p-value of `.569`, above almost all significance thresholds. MardiaSkewnessTest[N@lifeLengthData] Running the function `DistributionFitTest` on the data returns the following table. DistributionFitTest[ N@lifeLengthData, Automatic, {"TestDataTable", All}] For all tests appropriate for the data, none flagged the data as significantly different than variates of the normal distribution. Again, this is unexpected because of the typical strong positive skew of age distributions.

POSTED BY: Emmy Blumenthal

1 Reply

Sort By:

EDITORIAL BOARD

EDITORIAL BOARD, WOLFRAM

Posted 6 years ago

- Congratulations! This post is now featured in our Staff Pick column as distinguished by a badge on your profile of a Featured Contributor! Thank you, keep it coming, and consider contributing your work to the The Notebook Archive!

POSTED BY: EDITORIAL BOARD

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback