Group Abstract Group Abstract

Message Boards Message Boards

Add M49 standard country codes to a dataset?

Posted 6 years ago

Hey everyone,

I have to work on a university project where we have a dataset with all countries of the world are listed. I would like to add another row with countries/geographical regions which are defined by the UN in the M49 code (find more here). Is there a way to do this automatically in Mathematica?

I hope I could describe my problem in an understandable way. :) Greetings from the beginner side

POSTED BY: Tester Trying
7 Replies

Hi Florian (Tester Trying):

I would agree with Rohit on needed clarifications. The initial source data file and UN data in csv format need cleaning. The data files contain some incongruences pertaining to WolframData and the nature of country names over time and multiple regions with undefined sub-regions. As per your title, some pivoting on the M49 code helps. The following code is much the same as before up to UNcountryprop where we diverge away from Association and Dataset and just manipulate list. Instead of running through each line of the "master.csv" file and adding the two elements (columns) {"continent", "region"} it helps to optimize by reducing to a set of distinct countries, then mapping {"continent", "region"}

Clear[master, mastercountriesraw, UNmethoddata, addcountryprop, UNcountryprop, addcontinentandregionforexport, masterout, headings];
master = Import["C:\\suicide-rates-overview-1985-to-2016\\master.csv"];
mastercountriesraw = DeleteDuplicates[Rest[master[[All,1]]]];
UNmethoddata = Import["C:\\suicide-rates-overview-1985-to-2016\\UNSD — MethodologyFixed.csv"];
addcountryprop[cnty_] := Module[
       { countryraw, countryent, continent, unnumber, globalcode, globalname, regioncode, regionname, subregioncode, subregionname, intermediateregioncode, intermediateregionname, countryorarea, M49code, bdata}, 
       countryraw = cnty;
        countryent = Quiet[CountryData[cnty]];
        If[ToString[Head[countryent]] == "CountryData", countryent = Interpreter["Country"][cnty]];
        continent = CanonicalName[CountryData[countryent, "Continent"]];
        unnumber = CountryData[countryent, "UNNumber"];
        { globalcode, globalname, regioncode, regionname, subregioncode, subregionname, intermediateregioncode, intermediateregionname, countryorarea, M49code } = 
       Take[SelectFirst[UNmethoddata, (#[[10]] == unnumber) &], 10];
        bdata = Select[master, (#[[1]] == countryraw) &];
        { globalname, regionname, subregionname, If[intermediateregionname == "", subregionname, intermediateregionname], continent, countryraw, Rest /@ bdata }
       ];
UNcountryprop = Map[addcountryprop[#] &, mastercountriesraw];
addcontinentandregionforexport[item_] := Module[
       { contiregcountr }, 
       contiregcountr = {item[[5]], item[[4]], item[[6]]};
        Map[Join[contiregcountr, #] &, item[[7]] ] 
       ];
masterout = Flatten[Map[addcontinentandregionforexport[#] &, UNcountryprop], 1];
headings = Join[ {"continent","region"}, First[master] ];
Export["C:\\suicide-rates-overview-1985-to-2016\\masterout.csv", masterout, "CSV", "TableHeadings" -> headings]

In reviewing previous code I spotted some issues with the GroupBy; but you don't want a Dataset .Using the Wolfram Language to add two columns to a CSV file is actually straight forward. There are some complications as to how to add those columns in your post questions.

POSTED BY: Hans Michel
Posted 6 years ago

Hey Hans and Rohit,

thank you very much for your help. I tried both versions but it didn't work for me. My objective is to import the file which works perfectly by this:

OriginalData = Import[FileNameJoin[{NotebookDirectory[], "suicide.csv"}]];

Then I would like to add a column for continent and another column for region. This I would like to export to a csv file or work with it in Mathematica:

Export[FileNameJoin[{NotebookDirectory[],"suicide.mx"}], OriginalData]

The problem is that the other structure of the file should not be changed because I later want to test different hypothesis, do time analysis or geocomputation. The solution from you @Hans Michel changes the look of the file. The expanding option to click from World>Africa>Mauritius is nice but this will make further work more complicated. Is my request even possible to achieve? I am really on the beginner level here. :D If it is possible for you to send me a csv file with the added two columns that would be amazing. But of course I would also like to understand the work around behind it. Your help is highly appreciated Florian

POSTED BY: Tester Trying
Posted 6 years ago

Hi Tester Trying,

Can you be more specific regarding

I tried both versions but it didn't work for me

What exactly did not work? Was an error reported?

My answer does exactly what you requested, add continent and region columns. The raw data was imported as a Dataset because it provides a lot of useful functionality for working with structured data. I would suggest learning how to use Dataset, take a look at the examples.

If you prefer working with CSV outside of Mathematica then you can always Export the dataset as CSV.

POSTED BY: Rohit Namjoshi
POSTED BY: Hans Michel
Posted 6 years ago

Dear Hans Michel,

I have this dataset with all countries and multiple entries for different years per country. Ideally I would like to add two kind of rows. One with the continents and one with geographical regions like 'Caribbean', 'Central America' or 'North/South Europe' for example.

dataset: https://www.kaggle.com/russellyates88/suicide-rates-overview-1985-to-2016

regions:

enter image description here

POSTED BY: Tester Trying
Posted 6 years ago

Hi Tester,

That dataset is a CSV file. I think you mean add columns for continent and geographical region, not rows. Here is one way to do that

Download and import the dataset

suicideImport = Import["~/Downloads/master.csv", "Dataset", HeaderLines -> 1];

One country's name does not match WL data so rename it

suicide = 
  suicideImport[All, 
   If[#country == "Saint Vincent and Grenadines", <|#, "country" -> "Saint Vincent and the Grenadines"|>, #] &];

Add continent column

suicide = 
  suicide[All, <|#, "continent" -> CanonicalName@CountryData[#country]["Continent"]|> &];

Country and region data is available here, download Excel and import

unsdImport = 
  Import["~/Downloads/UNSD \[LongDash] Methodology.xlsx", {"Dataset", 1}, HeaderLines -> 1];

Generate map from country name to region

regionMap = 
  unsdImport[
     All, <|#"Country or Area" -> 
        If[#"Intermediate Region Name" == "", #"Sub-region Name", #"Intermediate Region Name"]|> &] //
     Normal // Association;

Add region to suicide dataset

suicide = suicide[All, <|#, "region" -> regionMap[#country]|> &]

There are still some issues, country "Czech Republic" has no mapping to region.

Total number of suicides by continent and year

suicide[GroupBy[{#continent, #year} &] /* Values,
 <|
   "continent" -> Query[Max, #continent &], 
   "year" -> Query[Max, #year &], 
   "total_suicides" -> Query[Total, #"suicides_no" &]
 |>]
POSTED BY: Rohit Namjoshi

Hello Tester...: This should be something to get started pending some clarification of your post. Do you want to add another column or row. And what is to be in that column.

data = Import["https://unstats.un.org/unsd/methodology/m49/", "Data"];
Rest[Part[data,2,1,2,1,2,1][[All,2]]]/. Map[CountryData[#, "UNNumber"] -> # &, CountryData[All]] 

Using the UNCode is problematic as there is an entry in the M49 data that is missing there are ways to fix this but the M49Code is all there. Nevertheless not all number match a Wolfram Language CountryData entity.

POSTED BY: Hans Michel
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard