Message Boards Message Boards

1
|
4774 Views
|
7 Replies
|
1 Total Likes
View groups...
Share
Share this post:

SemanticImport{} and Lat-Long Strings

Posted 10 years ago

I am playing around with the fantastic SemanticImport[] routine, and have run across the following small snag...

My CSV file looks like this...

City,Amount,Sold,Province,Loc,Dist Toronto,$2000.00,27-Mar-2014,Ontario,N45 W108,8 meters Montreal,$5000.00,18-Apr-2014,Quebec,N46.6 W103.5,12 meters

If I import with the HeaderLines -> 1 option, everything looks great except the Lat-Long strings, which get imported as GeoPosition[] objects, but not correctly; they look like this...

...,GeoPosition[{{0., 0., 0.}, {-45., 0., 0.}}],... ...,GeoPosition[{{0., 0., 0.}, {-46.6, 0., 0.}}],...

So it is making a heroic stab, but seems to be choking on the west longitudes, and is converting the north latitudes to longitudes.

Does anyone know how I can format my file to get this to work, or otherwise know what SemanticImport is expecting?

Thanks in advance

Brad

POSTED BY: Brad Varey
7 Replies
Posted 10 years ago

Hi, Arnoud

Thanks for this very full reply, it is much appreciated.

In fact I was in Champaign at the conference last week, and attended your Friday sessions. It now seems a shame I didn't make your acquaintance. Next time.

Thanks again

Brad

POSTED BY: Brad Varey

I've reported these limitations to developers at Wolfram, so that they can be addressed:

  • The provinces are actually interpreted as cities with SemanticImport (Ontario, California and Quebec City)
  • The geo positions are not correct

Here is a solution which should overcome the limitation of SemanticImport you are running into. First Import the file as CSV:

csv = Import["out.csv"]

Convert it to a Dataset:

dataset = Dataset[Map[AssociationThread[StringTrim /@ First[csv], #] &, Rest[csv]]]

Define handlers for each column type:

handler["City", assoc_] := (Interpreter["City"][assoc["City"]]);
handler["Amount", assoc_] := assoc["Amount"];
handler["Sold", assoc_] := (Interpreter["Date"][assoc["Sold"]]);
handler["Province", assoc_] := (Interpreter["AdministrativeDivision"][assoc["Province"]]);
handler["Loc", assoc_] := Module[{loc = assoc["Loc"], lat, long},
  {lat, long} = StringSplit[loc];
  lat = StringReplace[lat, {"N" -> "+", "S" -> "-"}];
  long = StringReplace[long, {"W" -> "-", "E" -> "+"}];
  GeoPosition[ToExpression /@ {lat, long}]
  ];
handler["Dist", assoc_] := (Interpreter["Quantity"][assoc["Dist"]]);

Call the dataset with the handlers (and renaming "Loc" to "Location" and "Dist" to "Distance"):

dataset[All, Association[{
  "City" -> handler["City", #],
  "Amount" -> handler["Amount", #],
  "Sold" -> handler["Sold", #],
  "Province" -> handler["Province", #],
  "Location" -> handler["Loc", #],
  "Distance" -> handler["Dist", #]
  }] &]

enter image description here

POSTED BY: Arnoud Buzing

You can select the file text (as you see it in the editor) and then click the "Code Sample" icon in the tools right above the editor (5th one, after Bold, Italic, Hyperlink, and Blockquote).

Also you can type four spaces in front of every line you want formatted like code (and clicking the button does exactly that: insert four spaces in front of your selected text).

I'll fix the spaces.

POSTED BY: Arnoud Buzing
Posted 10 years ago

Correct, or almost: there are spaces between N45 and W108, 8 and meters, and so on.

But how did you preserve the carriage returns in your posting? I tried to copy your post and past it into my current reply, and once again, "Community" stripped my carriage returns so all your text appeared on one line.

POSTED BY: Brad Varey

Formatting instructions are in http://community.wolfram.com/groups/-/m/t/270507 .

POSTED BY: Bruce Miller

So this is the sample input?

City,Amount,Sold,Province,Loc,Dist 
Toronto,2000.00,27?Mar?2014,Ontario,N45 W108,8 meters
Montreal,5000.00,18-Apr-2014,Quebec,N46.6 W103.5,12 meters
POSTED BY: Arnoud Buzing
Posted 10 years ago

Sorry, I just noticed that in uploading my text, the communities site has stripped my carriage returns.

So please rest assured that my file does have 3 lines, wrapping just after "...Dist", and "...,8 meters" and "...,12 meters", and that the returned GeoPositions that get returned are on their own separate lines.

POSTED BY: Brad Varey
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract