Message Boards Message Boards

1
|
4793 Views
|
7 Replies
|
1 Total Likes
View groups...
Share
Share this post:

SemanticImport{} and Lat-Long Strings

Posted 10 years ago

I am playing around with the fantastic SemanticImport[] routine, and have run across the following small snag...

My CSV file looks like this...

City,Amount,Sold,Province,Loc,Dist Toronto,$2000.00,27-Mar-2014,Ontario,N45 W108,8 meters Montreal,$5000.00,18-Apr-2014,Quebec,N46.6 W103.5,12 meters

If I import with the HeaderLines -> 1 option, everything looks great except the Lat-Long strings, which get imported as GeoPosition[] objects, but not correctly; they look like this...

...,GeoPosition[{{0., 0., 0.}, {-45., 0., 0.}}],... ...,GeoPosition[{{0., 0., 0.}, {-46.6, 0., 0.}}],...

So it is making a heroic stab, but seems to be choking on the west longitudes, and is converting the north latitudes to longitudes.

Does anyone know how I can format my file to get this to work, or otherwise know what SemanticImport is expecting?

Thanks in advance

Brad

POSTED BY: Brad Varey
7 Replies
Posted 10 years ago

Hi, Arnoud

Thanks for this very full reply, it is much appreciated.

In fact I was in Champaign at the conference last week, and attended your Friday sessions. It now seems a shame I didn't make your acquaintance. Next time.

Thanks again

Brad

POSTED BY: Brad Varey

I've reported these limitations to developers at Wolfram, so that they can be addressed:

  • The provinces are actually interpreted as cities with SemanticImport (Ontario, California and Quebec City)
  • The geo positions are not correct

Here is a solution which should overcome the limitation of SemanticImport you are running into. First Import the file as CSV:

csv = Import["out.csv"]

Convert it to a Dataset:

dataset = Dataset[Map[AssociationThread[StringTrim /@ First[csv], #] &, Rest[csv]]]

Define handlers for each column type:

handler["City", assoc_] := (Interpreter["City"][assoc["City"]]);
handler["Amount", assoc_] := assoc["Amount"];
handler["Sold", assoc_] := (Interpreter["Date"][assoc["Sold"]]);
handler["Province", assoc_] := (Interpreter["AdministrativeDivision"][assoc["Province"]]);
handler["Loc", assoc_] := Module[{loc = assoc["Loc"], lat, long},
  {lat, long} = StringSplit[loc];
  lat = StringReplace[lat, {"N" -> "+", "S" -> "-"}];
  long = StringReplace[long, {"W" -> "-", "E" -> "+"}];
  GeoPosition[ToExpression /@ {lat, long}]
  ];
handler["Dist", assoc_] := (Interpreter["Quantity"][assoc["Dist"]]);

Call the dataset with the handlers (and renaming "Loc" to "Location" and "Dist" to "Distance"):

dataset[All, Association[{
  "City" -> handler["City", #],
  "Amount" -> handler["Amount", #],
  "Sold" -> handler["Sold", #],
  "Province" -> handler["Province", #],
  "Location" -> handler["Loc", #],
  "Distance" -> handler["Dist", #]
  }] &]

enter image description here

POSTED BY: Arnoud Buzing
POSTED BY: Arnoud Buzing
Posted 10 years ago

Correct, or almost: there are spaces between N45 and W108, 8 and meters, and so on.

But how did you preserve the carriage returns in your posting? I tried to copy your post and past it into my current reply, and once again, "Community" stripped my carriage returns so all your text appeared on one line.

POSTED BY: Brad Varey

Formatting instructions are in http://community.wolfram.com/groups/-/m/t/270507 .

POSTED BY: Bruce Miller
POSTED BY: Arnoud Buzing
Posted 10 years ago
POSTED BY: Brad Varey
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract