Message Boards Message Boards

GeoPosition is slow compared to DateObject

Posted 5 months ago

Hello,

I have a Dataset with around 42k observation points from various meteorological stations. I run a

Query[All,DateObject[{#year,#month,#day}]]@ds

and it's around 0.25 seconds.

Now I run

Query[All,GeoPosition[{#latitude,#longitude}]]@ds

and it's a stunning 20 seconds! What's wrong with that? Is it that complex to create a geoposition object? I don't see this mentioned in the docs.

Best,
Przemek

POSTED BY: Przemyslaw K.
4 Replies
Posted 5 months ago

Ah, OK, I know, I had the lat/lon elements as Strings, Import from CSV didn't convert these... Thank you for your help!

POSTED BY: Przemyslaw K.

Hi,

GeoPosition is indeed slower than DateObject, but not as much as you seem to find. For example here we find a factor five:

In[1]:= AbsoluteTiming[DateObject /@ RandomInteger[10, {42000, 3}];]
Out[1]= {0.096223, Null}

In[2]:= AbsoluteTiming[GeoPosition /@ RandomInteger[90, {42000, 2}];]
Out[2]= {0.461047, Null}

In Dataset:

In[3]:= ds = Dataset[AssociationThread[{"Year", "Month", "Day", "Latitude", "Longitude"}, #] & /@ RandomInteger[10, {42000, 5}]];

In[4]:= AbsoluteTiming[Query[All, DateObject[{#Year, #Month, #Day}] &]@ds;]
Out[4]= {0.135063, Null}

In[5]:= AbsoluteTiming[Query[All, GeoPosition[{#Latitude, #Longitude}] &]@ds;]
Out[5]= {0.501642, Null}

In[6]:= AbsoluteTiming[Query[All, GeoPosition[N@ {#Latitude, #Longitude}] &]@ds;]
Out[6]= {0.196471, Null}

Note that working with reals brings the difference down to a factor two.

Is there anything special in your data that could slow down the processing?

Jose.

Posted 5 months ago

Apparently I can ParallelMap[GeoPosition,ds[All,{"latitude","longitude"}]//Normal but that puts me out of the Dataset world. I can't parallelize without the Normal, why?

POSTED BY: Przemyslaw K.

Unfortunately, this is a known bug.

In[23]:= Head@ParallelMap[f, Dataset[Range[5]]]

During evaluation of In[23]:= ParallelCombine::nopar1: Map[f][Dataset [<<5>>]] cannot be parallelized; proceeding with sequential evaluation.

Out[23]= Dataset

There is an internal report about the issue currently, and I've added you to the report so they know more people are concerned about it. Hopefully it will get fixed, and for now you'll apparently have to use normal or no parallelize, sorry...

POSTED BY: Eric Parfitt
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract