# Avoid Interpreter ZIP Code issues? // Unreliable access to knowledge base

GROUPS:
 Hi, Wolfram Community.I've been trying to run a line like this:zips = Interpreter["ZIPCode"] /@ {"95014", "01545", "94087", "95129", "01810", "10471", "02067", "01720", ... }Over a line with some 1200 zipcodes. The goal with this is having each of the elements in the list recognized as the entity ZIP code, and then assign those zip codes a scalar. (See here). .Every time I run the line I get a different result, i.e. it seems like depending on whether the knowledge base is available I get some of the codes turned into entities and sometimes I don't. Here are a few screenshots of my output. At first everything looks fine:Then it comes trouble, in different flavors every time:Does anyone know what is going on here, and most importantly, how can I have this computation done right in a reliable way? Any ideas are welcome.
5 months ago
5 Replies
 Dear Jorge,that does sometimes happen when one request lots of data from the servers and the internet connection is a bit flaky or the server very busy. Here are some remarks: You apply the Interpreter function one by one: zips = Interpreter["ZIPCode"] /@ {"95014", "01545", "94087", "95129", "01810", "10471", "02067", "01720"} Instead you might want to try and run them "in one go".  zips = Interpreter["ZIPCode"] [ {"95014", "01545", "94087", "95129", "01810", "10471", "02067", "01720"}] That is much more efficient and saves time. You should get better results, but it does not necessarily resolve your problem. This is not a really good solution, but you can try to iterate the procedure. For example you can run it once like so: zip1 = Transpose[{#, Interpreter["ZIPCode"][#] } & @(ToString /@ Range[85001, 85055])] where I use the Range command to generate a list of zip codes. This command leads to a result like this: Now we can iterate the procedure until there is no change: iterativeList=NestWhileList[(Transpose[{#, Interpreter["ZIPCode"][#] } & @Select[#, (Head[#[[2]]] === Failure) &][[All, 1]]]) &, zip1, Unequal, All] The idea is to select those that have not been interpreted correctly and do so until there is no change. The result would be this: Join[Select[zip1, Head[#[[2]]] === Entity &], DeleteDuplicatesBy[Reverse@Flatten[iterativeList, 1], #[[1]] &]] In my case that still contains some Failures, but we can eliminate them like this: DeleteDulpicates[Select[Join[Select[zip1, Head[#[[2]]] === Entity &], DeleteDuplicatesBy[Reverse@Flatten[iterativeList, 1], #[[1]] &]], Head[#[[2]]] === Entity &]] It is not quite ideal, but you get slightly better results. You could also try this: Entity["ZIPCode", #]&/@ (ToString /@ Range[85001, 85055])  That is much much faster, at least in my case. Perhaps you can try it and let me know how it works. Cheers,MarcoPS: I assume that the list of zip codes is nothing secret. Could you post it, i.e. attach it as a csv file or so?
5 months ago
 Dear Marco, Thank you very much for your valuable insights. There are several things I'd like to comment. First, long story short, I was lucky enough to run the thing and getting no errors at all, once. Only once in the 10 or more times I've tried. I immediately saved the results, of course. If any of the higher powers is reading this too, please know that this in undoubtedly a 100% on the Wolfram servers, I'm sorry to say. I hope this issue could be improved. I compared the one by one and the all at once options you mentioned, and in fact the all at once version ran in roughly a third of the time the one by one did:zips1 = Timing[Interpreter["ZIPCode"] /@ {"95014", "01545", ... } gives 294.467 seconds (and a ton of errors) zips2 = Timing[Interpreter["ZIPCode"] [{"95014", "01545",...}] gives 102.366 seconds and a fair share of errors too. After cleaning up the data I came with a list of 1177 zip codes and students (which I'm uploading for the sake of the exercise; it's not secret, indeed). After I got the zip codes list with no errors I associated it with the number of students per zip code:zipst = Transpose[{zips, students}]And then did this:GeoRegionValuePlot[zipst]And I obtained the following: Which is great in principle, but introduces a different set of challenges, like zooming-in relevant regions to make colors actually visible. I'm playing now with different options for GeoRegionValuePlot to improve this visualization. The data for the this problem is attached (ZipSt.xls). Attachments:
 Hi Jorge,thank you for posting the data. It is easier to understand what we are talking about. I am not really very familiar with this type of thing, but here is something that might be useful: zipstudents = Import["~/Desktop/ZipSt.xls"][[1, 2 ;;]]; zipentities = Entity["ZIPCode", ToString[#]] & /@ zipstudents[[All, 1]]; entityzipstudents = Transpose[Join[{zipentities}, Transpose[zipstudents]]]; puts the data in a useful format. It is faster than the Interpreter approach and does a reasonable job. Then you can use the new DynamicGeoGraphics: DynamicGeoGraphics[Flatten[{EdgeForm[Black], FaceForm[ColorData["TemperatureMap"][Log[#[[2]]]/Log[417.]]], Polygon[#[[1]]]} & /@ Select[entityzipstudents[[All, {1, -1}]], Head[#[[1]]["Polygon"]] =!= Missing &]]] You should obtain a dynamical interface. It is a bit sluggish, but works:You can move the centre of the image with the mouse and use the +/- at the lower right corner to zoom in or out. It is more responsive if you first zoom in a bit and then move the centre.Cheers,MarcoPS: The colour-scaling is of course a matter of taste.
 It also appears that there were lots of attendees from around the Boston area:Given that you are from Boston College StringTake[WikipediaData["Boston College"], 1985] you will probably be interested in that area. You can also calculate the distance between the different zip code areas and the Boston College: Quiet[distances = GeoDistance[Entity["University", "BostonCollege::m4rnc"], #] & /@ cleanentities[[All, 1]]] Here is a histogram of these distances: Histogram[Cases[distances, _Quantity], 200, ImageSize -> Large, PlotTheme -> "Marketing", LabelStyle -> Directive[Bold, Medium]] Of course this is not really a fair histogram, because there are different numbers of participants, so we have to include that: weighteddistances = Cases[Flatten[ConstantArray[#[[1]], #[[2]]] & /@ Transpose[{distances, Floor /@ cleanentities[[All, -1]]}]], _Quantity]; and then Histogram[weighteddistances, 200, ImageSize -> Large, PlotTheme -> "Marketing", LabelStyle -> Directive[Bold, Medium]] The average travel distance is (ignoring the zip codes we could not identify): Mean@QuantityMagnitude@UnitConvert[weighteddistances, Quantity[1, "Miles"]] 1026.43 miles.Cheers,Marco
 Thank you, Marco.I just want to add a couple things to your amazing contributions. First, in regard to the issue of actually getting the data in the form that is required, I got this great suggestion from the people at tech support. It involves defining an object called data: data = {"95014", "01545", "94087", "95129", "01810",... And then turn it into entities: ziplist = Map[Entity["ZIPCode", #] &, data] Provided that you have a curated list where every single entry is an actual zip code, this procedure seems to work reliably. Working in this way I was able to get a map of an interesting region that you identified as well (MA) by doing this: GeoRegionValuePlot[zipst, GeoRange -> {{41.5, 43.}, {-72., -70.}}, GeoLabels -> (Tooltip[#1, ZIPCodeData[#2, "Cities"]] &)] This results in a nice map with tooltips that looks like this:If you export to a html file, the tooltips show up. Many thanks for taking the time to discuss these issues.