Group Abstract Group Abstract

Message Boards Message Boards

3
|
28 Views
|
3 Replies
|
7 Total Likes
View groups...
Share
Share this post:

Spatial data in Tabular

As someone who deals with large spatial datasets, I am please to see that GeoPosition is now a supported column type in Tabular. But a couple of words of warning to others:

  1. It's not very efficient. In some basic testing, a column of GeoPosition objects takes around ten times as much storage as two machine precision columns just holding the latitudes and longitudes! This seems a bit odd to me, as I thought the point of Tabular was to maximize efficiency by bundling all the type and 'wrapper' information into the header so that raw data are stored in an optimized way. And the raw data are just the two coordinates, so…

  1. Only GeoPosition is supported, not GeoGridPosition, so there is no way to do a lot of common geo processing activities within the Tabular structure (talking advantage of its efficiencies). Again, the raw data for GeoGridPosition are just pairs of numbers: everything else (e.g., the projection) is 'wrapper.'

So it's a step in the right direction, but I hope that more steps are taken to make it a powerhouse for spatial data. Being able to do, for example, fast conversion between projections would be amazing. Oddly, both GeoPosition and GeoGridPosition can be used as 'wrappers' already in the sense that their arguments can be lists of coordinates, and this makes operations such as projection conversions very more efficient.

One final note: it's not necessarily obvious, but you can store lists in Tabular columns, so you can make one column that stores pairs of coordinates. This makes it relatively easy (though not especially fast) to apply geo processing functions to Tabular data:

POSTED BY: Gareth Russell
3 Replies

The difference is size is due to the fact that, for small Tabular objects, we cache the original expression. Try this instead:

latlong = Table[{RandomReal[{-80, 80}], RandomReal[{-180, 180}]}, {1000}];
tab1 = ToTabular[latlong, "Rows", <|"ColumnKeys" -> {"lat", "long"}|>];
tab2 = ToTabular[Map[{GeoPosition[#]} &, latlong], "Rows", <|"ColumnKeys" -> {"geo"}, "CacheOriginalExpression" -> False|>];

In[24]:= N[ByteCount[tab2]/ByteCount[tab1]]
Out[24]= 1.10757

Interesting: thanks for that, Can I ask why? Is there some overhead to using the efficient data structures that makes Tabular slower for small datasets? And roughly where is the cutoff of "small"?

POSTED BY: Gareth Russell
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard