Message Boards Message Boards

GROUPS:

Avoid Interpreter ZIP Code issues? // Unreliable access to knowledge base

Posted 11 months ago
1030 Views
|
5 Replies
|
0 Total Likes
|

Hi, Wolfram Community.

I've been trying to run a line like this:

zips = Interpreter["ZIPCode"] /@ {"95014", "01545", "94087", "95129", "01810", "10471", "02067", "01720", ... }

Over a line with some 1200 zipcodes. The goal with this is having each of the elements in the list recognized as the entity ZIP code, and then assign those zip codes a scalar. (See here). .

Every time I run the line I get a different result, i.e. it seems like depending on whether the knowledge base is available I get some of the codes turned into entities and sometimes I don't.

Here are a few screenshots of my output. At first everything looks fine:

enter image description here

Then it comes trouble, in different flavors every time:

enter image description here

enter image description here

Does anyone know what is going on here, and most importantly, how can I have this computation done right in a reliable way? Any ideas are welcome.

5 Replies

Dear Jorge,

that does sometimes happen when one request lots of data from the servers and the internet connection is a bit flaky or the server very busy. Here are some remarks:

  1. You apply the Interpreter function one by one:

    zips = Interpreter["ZIPCode"] /@ {"95014", "01545", "94087", "95129", "01810", "10471", "02067", "01720"}
    

    Instead you might want to try and run them "in one go".

     zips = Interpreter["ZIPCode"] [ {"95014", "01545", "94087", "95129", "01810", "10471", "02067", "01720"}]
    

    That is much more efficient and saves time. You should get better results, but it does not necessarily resolve your problem.

  2. This is not a really good solution, but you can try to iterate the procedure. For example you can run it once like so:

    zip1 = Transpose[{#, Interpreter["ZIPCode"][#] } & @(ToString /@ Range[85001, 85055])]
    

    where I use the Range command to generate a list of zip codes. This command leads to a result like this:

enter image description here

Now we can iterate the procedure until there is no change:

iterativeList=NestWhileList[(Transpose[{#, Interpreter["ZIPCode"][#] } & @Select[#, (Head[#[[2]]] === Failure) &][[All, 1]]]) &, zip1, Unequal, All]

The idea is to select those that have not been interpreted correctly and do so until there is no change. The result would be this:

Join[Select[zip1, Head[#[[2]]] === Entity &], DeleteDuplicatesBy[Reverse@Flatten[iterativeList, 1], #[[1]] &]]

In my case that still contains some Failures, but we can eliminate them like this:

DeleteDulpicates[Select[Join[Select[zip1, Head[#[[2]]] === Entity &], DeleteDuplicatesBy[Reverse@Flatten[iterativeList, 1], #[[1]] &]], Head[#[[2]]] === Entity &]]

It is not quite ideal, but you get slightly better results.

  1. You could also try this:

    Entity["ZIPCode", #]&/@ (ToString /@ Range[85001, 85055])
    

That is much much faster, at least in my case. Perhaps you can try it and let me know how it works.

Cheers,

Marco

PS: I assume that the list of zip codes is nothing secret. Could you post it, i.e. attach it as a csv file or so?

Dear Marco, Thank you very much for your valuable insights.

There are several things I'd like to comment. First, long story short, I was lucky enough to run the thing and getting no errors at all, once. Only once in the 10 or more times I've tried. I immediately saved the results, of course. If any of the higher powers is reading this too, please know that this in undoubtedly a 100% on the Wolfram servers, I'm sorry to say. I hope this issue could be improved.

I compared the one by one and the all at once options you mentioned, and in fact the all at once version ran in roughly a third of the time the one by one did:

zips1 = Timing[Interpreter["ZIPCode"] /@ {"95014", "01545", ... } gives 294.467 seconds (and a ton of errors) zips2 = Timing[Interpreter["ZIPCode"] [{"95014", "01545",...}] gives 102.366 seconds and a fair share of errors too.

After cleaning up the data I came with a list of 1177 zip codes and students (which I'm uploading for the sake of the exercise; it's not secret, indeed). After I got the zip codes list with no errors I associated it with the number of students per zip code:

zipst = Transpose[{zips, students}]

And then did this:

GeoRegionValuePlot[zipst]

And I obtained the following: enter image description here

Which is great in principle, but introduces a different set of challenges, like zooming-in relevant regions to make colors actually visible. I'm playing now with different options for GeoRegionValuePlot to improve this visualization.

The data for the this problem is attached (ZipSt.xls).

Attachments:

Hi Jorge,

thank you for posting the data. It is easier to understand what we are talking about. I am not really very familiar with this type of thing, but here is something that might be useful:

zipstudents = Import["~/Desktop/ZipSt.xls"][[1, 2 ;;]];
zipentities = Entity["ZIPCode", ToString[#]] & /@ zipstudents[[All, 1]];
entityzipstudents = Transpose[Join[{zipentities}, Transpose[zipstudents]]];

puts the data in a useful format. It is faster than the Interpreter approach and does a reasonable job.

Then you can use the new DynamicGeoGraphics:

DynamicGeoGraphics[Flatten[{EdgeForm[Black], FaceForm[ColorData["TemperatureMap"][Log[#[[2]]]/Log[417.]]], 
Polygon[#[[1]]]} & /@ Select[entityzipstudents[[All, {1, -1}]], Head[#[[1]]["Polygon"]] =!= Missing &]]]

You should obtain a dynamical interface. It is a bit sluggish, but works:

enter image description here

You can move the centre of the image with the mouse and use the +/- at the lower right corner to zoom in or out. It is more responsive if you first zoom in a bit and then move the centre.

Cheers,

Marco

PS: The colour-scaling is of course a matter of taste.

It also appears that there were lots of attendees from around the Boston area:

enter image description here

Given that you are from Boston College

StringTake[WikipediaData["Boston College"], 1985]

enter image description here

you will probably be interested in that area. You can also calculate the distance between the different zip code areas and the Boston College:

Quiet[distances = 
  GeoDistance[Entity["University", "BostonCollege::m4rnc"], #] & /@  cleanentities[[All, 1]]]

Here is a histogram of these distances:

Histogram[Cases[distances, _Quantity], 200, ImageSize -> Large, 
 PlotTheme -> "Marketing", LabelStyle -> Directive[Bold, Medium]]

enter image description here

Of course this is not really a fair histogram, because there are different numbers of participants, so we have to include that:

weighteddistances = 
Cases[Flatten[ConstantArray[#[[1]], #[[2]]] & /@ Transpose[{distances, Floor /@ cleanentities[[All, -1]]}]], _Quantity];

and then

Histogram[weighteddistances, 200, ImageSize -> Large, PlotTheme -> "Marketing", LabelStyle -> Directive[Bold, Medium]]

enter image description here

The average travel distance is (ignoring the zip codes we could not identify):

Mean@QuantityMagnitude@UnitConvert[weighteddistances, Quantity[1, "Miles"]]

1026.43 miles.

Cheers,

Marco

Thank you, Marco.

I just want to add a couple things to your amazing contributions. First, in regard to the issue of actually getting the data in the form that is required, I got this great suggestion from the people at tech support. It involves defining an object called data:

data = {"95014", "01545", "94087", "95129", "01810",...

And then turn it into entities:

ziplist = Map[Entity["ZIPCode", #] &, data]

Provided that you have a curated list where every single entry is an actual zip code, this procedure seems to work reliably. Working in this way I was able to get a map of an interesting region that you identified as well (MA) by doing this:

GeoRegionValuePlot[zipst, GeoRange -> {{41.5, 43.}, {-72., -70.}}, 
 GeoLabels -> (Tooltip[#1, ZIPCodeData[#2, "Cities"]] &)]

This results in a nice map with tooltips that looks like this:

enter image description here

If you export to a html file, the tooltips show up. Many thanks for taking the time to discuss these issues.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract