Message Boards Message Boards

Query curated data for counties in the US?

Posted 8 years ago

I have been trying to form a query in MMA11 for data for all counties in the US. Apparently, one must first acquire a list of the states and territories, use that to query for counties, and then use the full list of county entities in the final query for data. I try this in the attached notebook.

As in the documentation example, AdministrativeDivisionData can be used to get a list of the counties in a state entity, where the state entity can be entered with freeform input. It seems reasonable then that the states and territories could be gotten the same way by calling AdministrativeDivisionData with the United States (as freeform) used in place of the state name. But it seems that the data system does not believe that the United States country entity has administrative subdivisions.

Can someone please give me a clue as to how this can be done? For example, determine the median income for all counties in the US?

Thanks and kind regards,

David

Attachments:
POSTED BY: David Keith
15 Replies

Hacking through this... I had to do this some while ago, but can't find the code.. here is something that works:

stateNames = 
 StringJoin@(Capitalize /@ 
      StringSplit[StringReplace[#, "," -> ""]][[;; -3]]) & /@ (#["Name"] & /@ 
    EntityList[
     EntityClass["AdministrativeDivision", "AllUSStatesPlusDC"]])

And then

counties = Flatten[
  EntityList[
     EntityClass["AdministrativeDivision", "USCounties" <> #]] & /@  stateNames];

(Deconstruct this from the inside out to see the reasons for each step) This then gives

Length[counties]  -> 3143

Querying Wolfram Alpha on this number, it gives 3144. Exercise for the interested student (that's not me!): why is there a missing county?

POSTED BY: David Reiss

OK DavidR and AlanC...this code seems to work for me now...

stateNames = 
 StringJoin@(Capitalize /@ 
      StringSplit[StringReplace[#, "," -> ""]][[;; -3]]) & /@ (#[
      "Name"] & /@ 
    EntityList[
     EntityClass["AdministrativeDivision", "AllUSStatesPlusDC"]])

counties = 
  Flatten[EntityList[
      EntityClass["AdministrativeDivision", "USCounties" <> #]] & /@ 
    stateNames];

Length@counties

income = EntityValue[counties, 
   EntityProperty["AdministrativeDivision", "MedianHouseholdIncome"]];

Length@income

GeoRegionValuePlot[Rule @@@ Transpose[{counties, income}], 
 GeoRange -> 
  EntityClass["AdministrativeDivision", "ContinentalUSStates"]]
POSTED BY: Aeyoss Antelope

In the code above, the following line consistently gets an EntityValue[] timeout error on V11.0 and returns EntityList[] unevaluated on V10.1.

counties = 
  EntityList[
   EntityClass["AdministrativeDivision", 
    "ParentRegion" -> 
     EntityClass["AdministrativeDivision", "AllUSStatesPlusDC"]]];

In the code above the following 2 lines work as expected on V11.0 and V10.1.

stateNames = 
 StringJoin@(Capitalize /@ 
      StringSplit[StringReplace[#, "," -> ""]][[;; -3]]) & /@ (#[
      "Name"] & /@ 
    EntityList[
     EntityClass["AdministrativeDivision", "AllUSStatesPlusDC"]])

counties = 
  Flatten[EntityList[
      EntityClass["AdministrativeDivision", "USCounties" <> #]] & /@ 
    stateNames];
POSTED BY: Aeyoss Antelope
In[26]:= counties = 
  EntityList[
   EntityClass["AdministrativeDivision", 
    "ParentRegion" -> 
     EntityClass["AdministrativeDivision", "AllUSStatesPlusDC"]]];

In[28]:= income = 
  EntityValue[counties, 
   EntityProperty["AdministrativeDivision", "MedianHouseholdIncome"]];

In[30]:= GeoRegionValuePlot[Rule @@@ Transpose[{counties, income}], 
 GeoRange -> 
  EntityClass["AdministrativeDivision", "ContinentalUSStates"]]

enter image description here

POSTED BY: Alan Joyce

Thanks for posting this Alan. Two things come up now:

  1. Where can I find these constants you use in your code?

"ContinentalUSStates"

"AllUSStatesPlusDC"

  1. When I run this code on Windows V10.1 (no changes), I get a blank map after all these messages come out (see attached). Is this a simple fix?

GeoRegionValuePlot::noents: no valid location -> value pairs found >>

Transpose::nmtx: The first two levels of {} cannot be transposed. >>

Set::shape: Lists {SystemGeoPlotsDumpgeostuff$159588,SystemGeoPlotsDumpvalues$159588} and Transpose[{}] are not the same shape. >>

Transpose::nmtx: The first two levels of {SystemGeoPlotsDumpgeostuff$159588,{}} cannot be transposed. >>

General::stop: Further output of Transpose::nmtx will be suppressed during this calculation. >>

$RecursionLimit::reclim2: Recursion depth of 1024 exceeded during evaluation of System`GeoPlotsDump`geostuff$159588->SystemGeoPlotsDumpgeostuff$159588. >>

$RecursionLimit::reclim2: Recursion depth of 1024 exceeded during evaluation of System`GeoPlotsDump`geostuff$159588->SystemGeoPlotsDumpgeostuff$159588. >>

Attachments:
POSTED BY: Aeyoss Antelope
Posted 8 years ago

You can always find those constants by calling EntityClassList["AdministrativeDivision"].

You can also use ctrl + =, for example "us states + DC" gives EntityClass["AdministrativeDivision", "AllUSStatesPlusDC"].

POSTED BY: Greg Hurst

Hmmmm... when I execute

EntityList[
 EntityClass["AdministrativeDivision", 
  "ParentRegion" -> 
   EntityClass["AdministrativeDivision", "AllUSStatesPlusDC"]]]

in M 11, it returns unevaluated. Perhaps something to do with the connection at the coffeehouse where I am? But other queries work... Odd...

Are you using a development version of Mathematica?

POSTED BY: David Reiss

I had suppressed output since that was in Alan's sample code. Now, unsuppressed, it looks like Mma V10.0 (Home) Win8,1 also comes back unevaluated from that command.

POSTED BY: Aeyoss Antelope

Actually on my home network (where I now am) it seems to be trying to evaluate... going very slow (as these cloud data computations often go-often -making them difficult to use for computationally intensive work)

And the progress box under says things like "Downloading 157 of 3143 values...." so the 3143 is the number of counties. But it also throws error messages like"EntityValue::nodat: Unable to download data. Some or all results may be missing."

In my experience, things like this sometimes get in the way of seriously using the Entity framework for significant computations--slow downloads and deterministically unreliable download failures (sometimes none and sometimes random ones timing out--so the computation cannot be trusted to complete)

Still waiting after 5 minutes for the first retrieval of the 3143 to complete.... stay tuned....

UPDATE: Well it is taking forever so I am aborting it...

POSTED BY: David Reiss

OK DavidR and AlanC...this code seems to work for me now...

stateNames = 
 StringJoin@(Capitalize /@ 
      StringSplit[StringReplace[#, "," -> ""]][[;; -3]]) & /@ (#[
      "Name"] & /@ 
    EntityList[
     EntityClass["AdministrativeDivision", "AllUSStatesPlusDC"]])

counties = 
  Flatten[EntityList[
      EntityClass["AdministrativeDivision", "USCounties" <> #]] & /@ 
    stateNames];

Length@counties

income = EntityValue[counties, 
   EntityProperty["AdministrativeDivision", "MedianHouseholdIncome"]];

Length@income

GeoRegionValuePlot[Rule @@@ Transpose[{counties, income}], 
 GeoRange -> 
  EntityClass["AdministrativeDivision", "ContinentalUSStates"]]
POSTED BY: David Reiss

Ok, seems to be working now though very slow... hard to understand the instability of all of this...

POSTED BY: David Reiss

Note that the graphic yields a 45 megabyte file upon saving the notebook... seems quite excessive.

POSTED BY: David Reiss
Posted 8 years ago

Thanks, Alan. When I tried the code this morning it failed in the first cell with the error "EntityValue::ctimeout: A computation timed out." I still get the error, however it worked for me just now. I find the syntax and constants a bit mysterious. Can you recommend some documentation that covers the usages in your reply?

Thanks again and best regards,

David

POSTED BY: David Keith

This is explained better in the attached notebook. It is test cases for:

1) XXXData[]

2) W|a

3) free-form linguistics

4) Interpreter[]

for one element of one domain.

Attachments:
POSTED BY: Aeyoss Antelope
  1. Based on this link (), I have been avoiding the XXXData[] objects in favor of the EntityXXX[] classes. If this is the approach you have to take with Entity[]s, maybe their design is not suited for problems like this. Of course, doing this with SQL is trivial

  2. There are some challenges to accomplish what you are seeking. I hope that the method below is either not the most efficient and/or it will be improved going forward.

I did not try the approach of downloading all of the (hundred thousand plus) administrative divisions. Based on some examples, I did the following:

  • asked W|a for a list of states
  • fed this list into Interpreter[] to get the state Enitity[]s
  • fed the list of states back into W|a to get a list of counties, for each state
  • fed this list into Interpreter[] to get the county Enitity[]s <<< there are problems with this (i.e. Hamilton County, Ohio was returned by Interpreter[])
  • got the list of administrative division properties to find the desired properties
  • (* next is do the work to crunch the data together...not done because of the next bullet *)

I can guess that bringing all AdministrativeDivisions into memory would fail on some machines.

  1. Further, I found a lot of the properties have missing data when querying for counties. The data is much more prevalent when querying for states. Of course both are AdministrativeDivisions and thus have the same set of properties.

The code (in the Notebook) attached is just a sketch of how this method could be accomplished. A full application could be created from this sketch.

Attachments:
POSTED BY: Aeyoss Antelope
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract