Message Boards Message Boards

Custom entities with Bostock's unemployment data and EntityStore

Posted 8 years ago

Note: I originally posted a version of this tutorial on Mathematica StackExchange. I was encouraged to make this crosspost and realized that a few things had changed since I wrote the original post. Notably, many counties now have FIPS codes. I've rewritten the text to make use of this.

Goal

The idea of the original text was to recreate Mike Bostock's D3.js choropleth map. Here is a screenshot of what it looks like:

This choropleth map shows the unemployment rate in all the different US counties. I will try to use the Wolfram Knowledgebase and high-level geographics functions as much as possible. This means building my own entity store with entities that inherit properties from built-in county entities but also extend them with new properties.

Data from Wolfram Knowledgebase

GeoRegionValuePlot makes it very easy to plot data that is built into region entities. For example, this is how we can plot the fraction of each county's population that is foreign born.

counties = Cases[
   EntityList@Entity["AdministrativeDivision", {"Country" -> Entity["Country", "UnitedStates"]}],
   Entity[_, {_, _, _}] (*separates states from counties*)
   ];

GeoRegionValuePlot[counties -> "ForeignBornFraction"]

enter image description here

"ForeignBornFraction" is an entity property to administrative regions. Unfortunately, county properties are either missing or take the value of their state. The plot reflects the latter.

GeoRegionValuePlot is very convenient and it can be suitably styled, but Bostock's unemployment data is not available in Wolfram Knowledgebase. To be able to use the function for our purpose, we need to create our own entities where we have added Bostock's data as a property.

Custom data

We start by importing Bostock's data set.

data = <|Rule @@@ Rest@Import[
     "https://gist.githubusercontent.com/mbostock/4060606/raw/25385f68a3be5c9dbe36af27fc2498fb2aab6bc0/unemployment.tsv",
      "TSV"
      ]|>;

data // Shallow

(* Out: <|1001 -> 0.097, 1003 -> 0.091, 1005 -> 0.134, ... *)

Each entry consists of a county FIPS code and the corresponding unemployment rate. Most county entities have the FIPS code:

enter image description here

This makes it easy to correlate county entities with data points. However, in order to use GeoRegionValuePlot, we need entities which have the unemployment rate as one of their properties. We will achieve this by creating a new entity class, which inherits the necessary data from county entities but also includes our custom data.

getEntityIdentifier[Entity[_, {county_, state_, country___}]] := county <> "-" <> state
getEntityIdentifier[Entity[_, {"DistrictOfColumbia", _}]] := "DistrictOfColumbia"

customEntity[entity_] := If[
  ! MatchQ[entity["FIPSCode"], _Missing],
  getEntityIdentifier[entity] -> <|
    "Label" -> Style[entity["Name"], Bold],
    "ParentEntity" -> entity,
    "Unemployment" -> data[ToExpression@entity["FIPSCode"]]
    |>,
  Nothing
  ]

customEntity creates a "child entity" with the properties ParentEntity and `Unemployment. Counties which do not have a FIPS code are ignored. An example of a custom entity:

Custom entity

Note that we could easily add many other properties. Now create a child entity for each county:

Quiet[countyEntities = <|customEntity /@ counties|>;]

I don't know why, but it only worked if I first went to File -> Preferences -> Internet connectivity and unchecked the box "Allow the Wolfram System to access the Internet". Otherwise, the evaluation would hang forever. I put Quiet in there because otherwise it would complain about not having Internet access. Don't forget to enable the Internet connectivity again when you're done with this part, you'll need it for the geo functions.

We now create an entity store with the custom entities. This entity store comes with properties which are automatically computed from the parent entities. "Inheritance" is achieved by using the ParentEntity property to get the corresponding properties from the parent entity. The properties which are inherited have been chosen because they are the ones which are necessary to use GeoRegionValuePlot. GeoRegionValuePlot uses those properties internally.

store = EntityStore["USCounty" -> <|
     "Entities" -> countyEntities,
     "Properties" -> <|
       "ParentEntity" -> <|
         "Label" -> "Parent entity"
         |>,
       "FIPSCode" -> <|
         "DefaultFunction" -> (#["ParentEntity"]["FIPSCode"] &),
         "Label" -> "County FIPS code"
         |>,
       "Polygon" -> <|
         "DefaultFunction" -> (#["ParentEntity"]["Polygon"] &),
         "Label" -> "Polygon"
         |>,
       "Position" -> <|
         "DefaultFunction" -> (#["ParentEntity"]["Position"] &),
         "Label" -> "Position"
         |>,
       "Latitude" -> <|
         "DefaultFunction" -> (#["ParentEntity"]["Latitude"] &),
         "Label" -> "Latitude"
         |>,
       "Longitude" -> <|
         "DefaultFunction" -> (#["ParentEntity"]["Longitude"] &),
         "Label" -> "Longitude"
         |>,
       "HasPolygon" -> <|
         "DefaultFunction" -> (#["ParentEntity"]["HasPolygon"] &),
         "Label" -> "Has polygon?"
         |>,
       "Name" -> <|
         "DefaultFunction" -> (#["ParentEntity"]["Name"] &),
         "Label" -> "Name"
         |>
       |>,
     "EntityClasses" -> <|
       "Alaska" -> <|
         "Entities" -> ("ParentEntity" -> (MatchQ[#, 
               Entity[_, {_, "Alaska", _}]] &)),
         "Label" -> Style["Alaska", Bold]
         |>,
       "Hawaii" -> <|
         "Entities" -> ("ParentEntity" -> (MatchQ[#, Entity[_, {_, "Hawaii", _}]] &)),
         "Label" -> Style["Hawaii", Bold]
         |>,
       "Mainland" -> <|
         "Entities" -> ("ParentEntity" -> (MatchQ[#, Entity[_, {_, Except["Alaska" | "Hawaii"], _}]] &)),
         "Label" -> Style["Mainland", Bold]|>
       |>
     |>];

Demo of the entity store

We now add the entity store to the list of available entity stores:

AppendTo[$EntityStores, store];

We could also add it as a resource object for persistence.

ro = ResourceObject[<|
    "Name" -> "USCounty",
    "ResourceType" -> "DataResource",
    "Content" -> store
    |>];

The next time we'd like to use this entity store in a new session, we could simply write

AppendTo[$EntityStores, ResourceObject["USCounty"]]

Now we inspect our entity store:

Entity list

Entity list 2

Back to GeoRegionValuePlot

With the entity store in place we can now use GeoRegionValuePlot to plot our custom data like this:

GeoRegionValuePlot[EntityList[EntityClass["USCounty", "Mainland"]] -> "Unemployment"]

Unemployment choropleth

With a few options it looks almost the same as Bostock's map:

getColor[val_?NumericQ] := Which[
  0 < val < 0.01667, RGBColor[{247, 251, 255}/255],
  0.0166 < val < 0.0333, RGBColor[{222, 235, 247}/255],
  0.0333 < val < 0.05, RGBColor[{198, 219, 239}/255],
  0.05 < val < 0.0666, RGBColor[{158, 202, 225}/255],
  0.0666 < val < 0.0833, RGBColor[{107, 174, 214}/255],
  0.0833 < val < 0.1, RGBColor[{66, 146, 198}/255],
  0.1 < val < 0.1166, RGBColor[{33, 113, 181}/255],
  0.1166 < val < 0.1333, RGBColor[{8, 81, 156}/255],
  0.1333 < val < 0.15, RGBColor[{8, 48, 107}/255],
  True, RGBColor[{8, 48, 107}/255]
  ]
getColor[val_] := RGBColor[{8, 48, 107}/255]

mainland = GeoRegionValuePlot[
  EntityList[EntityClass["USCounty", "Mainland"]] -> "Unemployment",
  ColorFunction -> getColor,
  ColorFunctionScaling -> False,
  PlotLegends -> None,
  PlotStyle -> Directive[EdgeForm[None]],
  GeoBackground -> None,
  GeoProjection -> {
    "LambertAzimuthal",
    "Centering" -> GeoPosition[{30, -195/2}]
    },
  PlotRange -> {{-0.37, 0.38}, {-0.13, 0.38}},
  ImageSize -> 660
  ]

Choropleth, Bostock style

There are a few white counties here which are missing because they don't have the "FIPSCode" property. In my original version of this post over at Mathematica.StackExchange I used an external source for FIPS codes because at that time counties either did not have them or they were incorrect. Now that they are mostly correct I think that the added elegance of not having to use an external data source is preferred.

The last details

We now add Alaska and Hawaii:

getPolygon[entity_] := {
  GeoStyling[None],
  getColor[entity["Unemployment"]], Polygon[entity]
  }

alaska = GeoGraphics[
   getPolygon /@ EntityList[EntityClass["USCounty", "Alaska"]],
   GeoBackground -> None
   ];
hawaii = GeoGraphics[
   getPolygon /@ EntityList[EntityClass["USCounty", "Hawaii"]],
   GeoBackground -> None
   ];

map = Show[
  mainland,
  GeoGraphics[{
    Inset[alaska, {-0.3, -0.06}, {0, 0}, 0.3],
    Inset[hawaii, {-0.15, -0.06}, {0.1569, -0.0436}, 0.3]
    }]
  ]

Choropleth, Bostock style, with Hawaii

We can see Hawaii, but unfortunately no county entities for Alaskan counties have FIPS codes in them, so Alaska has disappeared.

To add the borderlines I first make a black and white mask, where states are black and the background is white. The state borderlines that I want to draw are white as well; I then use ImageAdd to put the images together. The counties will be unaffected because we'll only be adding black to them, except for on the borderlines. In principle, I could add the GeoRegionValuePlot to the GeoGraphics with Hawaii, but I use GeoGraphics for both to ensure matching image padding.

states = Cases[
   EntityList@
    Entity["AdministrativeDivision", {"Country" -> 
       Entity["Country", "UnitedStates"]}],
   Entity[_, {_, _}]
   ];

alaska = GeoGraphics[
   Polygon /@ EntityList[EntityClass["USCounty", "Alaska"]],
   GeoBackground -> None
   ];
hawaii = GeoGraphics[
   Polygon /@ EntityList[EntityClass["USCounty", "Hawaii"]],
   GeoBackground -> None
   ];

map = GeoGraphics[{
    getPolygon /@ EntityList[EntityClass["USCounty", "Mainland"]],
    Inset[alaska, {-0.3, -0.06}, {0, 0}, 0.3],
    Inset[hawaii, {-0.15, -0.06}, {0.1569, -0.0436}, 0.3]
    },
   GeoBackground -> None,
   GeoProjection -> {
     "LambertAzimuthal",
     "Centering" -> GeoPosition[{30, -195/2}]
     },
   PlotRange -> {{-0.37, 0.38}, {-0.13, 0.38}},
   ImageSize -> 1000
   ];

borders = GeoGraphics[{
    FaceForm[Black],
    EdgeForm[White],
    GeoStyling[None],
    Polygon /@ states,
    Inset[alaska, {-0.3, -0.06}, {0, 0}, 0.3],
    Inset[hawaii, {-0.15, -0.06}, {0.1569, -0.0436}, 0.3]},
   GeoBackground -> None,
   GeoProjection -> {
     "LambertAzimuthal",
     "Centering" -> GeoPosition[{30, -195/2}]
     },
   PlotRange -> {{-0.37, 0.38}, {-0.13, 0.38}},
   ImageSize -> 660
   ];

ImageAdd[map, borders]

Finished map

With all FIPS codes

When I used an external source for FIPS codes I had FIPS codes for all the counties. Here is an image which shows what the picture looks like then. Hopefully FIPS codes for the remaining counties will be added to the built-in knowledgebase in the future.

Image with all counties

POSTED BY: Calle E

enter image description here - you have earned "Featured Contributor" badge, congratulations !

This is a great post and it has been selected for the curated Staff Picks group. Your profile is now distinguished by a "Featured Contributor" badge and displayed on the "Featured Contributor" board.

POSTED BY: EDITORIAL BOARD
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract