Message Boards Message Boards

2
|
2391 Views
|
0 Replies
|
2 Total Likes
View groups...
Share
Share this post:

Work efficiently with computable data?

Posted 8 years ago

This question is mostly about the "right way" to work with entities and computable data. I am not very experienced with this area of functionality.

The biggest problem I am facing right now is figuring out how make certain functions run efficiently.

For example, the following will retrieve the 5000 nearest stars with enough additional information to plot them in a nice way. After several trials, I managed to write all of it as a single query, which seems to improve performance.

dataset = StarData[
   EntityClass["Star", 
         {EntityProperty["Star", "DistanceFromEarth"] -> TakeSmallest[5000]}],
   {"Name", "Color", "HelioCoordinates", "AbsoluteMagnitude"}, 
   "Dataset"
   ];

Yet it still take a very long time. As in: go get a coffee, come back, and it still hasn't finished. I am curious what the reason for this is. I am using a reasonably fast internet connection 5000 entries is actually very very little:

In[150]:= ByteCount[dataset]
Out[150]= 8931368

In[151]:= ByteCount@Compress[dataset]
Out[151]= 335488

With basic compression the whole thing is only about 330 kB. A file of that size usually takes under a second to download.

I am also not very sure about how to work efficiently with units, which are usually present in the results of such queries.

For example, let's get all the star coordinates: list = Normal[dataset[Values, "HelioCoordinates"]];.

To work with them, we must convert all quantities to the same unit (and maybe drop the unit for better performance). The naive way is extremely slow, even though there are only 5000 coordinate-triplets:

In[147]:= QuantityMagnitude[list, "ly"]; // AbsoluteTiming
Out[147]= {33.4842, Null}

Providing the unit in a standard form speeds things up 6-fold, but it's still extremely slow:

In[148]:= QuantityMagnitude[list, Quantity[1, "LightYears"]]; // AbsoluteTiming
Out[148]= {5.09046, Null}

With unitless numbers Mathematica easily handles arithmetic with millions of numbers in milliseconds.

Is there a way to improve performance with units?

Is there a way to retrieve more stars, say, 10000-50000, and work with them in a comfortable way?

Because of these problems, it often feels easier to just download a database from elsewhere and work with it directly instead of using the curated data.

POSTED BY: Szabolcs Horvát
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract