Message Boards Message Boards

Computing with Entity classes involving large amounts of data

Posted 10 years ago

Continuing along the lines of the question here

but within Mathematica rather than Wolfram|Alpha... And using this as a prototype of a more general question....

How does one efficiently compute the list of stars within a particular distance from the sun? (This is a prototype of the more general question of how to use curated Entity data that involves a large quantities of data to then process.)

One can get a list of all stars using (there appear to be about 107000 of them... so don't execute this if you don't want to wait).

EntityList[EntityClass["Star", "Star"]]

And one can get the stars' names along with their distances (and thus select those stars from the list that are within a desired maximum distance from the Earth) using

 EntityClass["Star", "Star"], {EntityProperty["Star", "Name"], 
  EntityProperty["Star", "DistanceFromEarth"]}]

But, again, one has to download all 107000 items and then perform the calculation.

So my question is, is there a syntax that can solve this problem (say asking for the stars that are within 10 lightyears of the Earth) without having to actually download all 107000 the stars' data first. It takes an excruciatingly long time... in fact it is not clear that that computation will properly complete.

POSTED BY: David Reiss
2 Replies

Thanks Sean. This was largely what I thought was the case, but I wanted to make sure that there wasn't something basic and conceptual that I was missing. I don't have a particularly good model in my head yet for how the entire Entity-related landscape works and, for example, what functions in the Wolfram language can take entities as their arguments and return values. Some are documented, but some not yet, I think. And since most of this data resides on Wolfram's cloud servers and is downloaded as needed when requested from within Mathematica, it is not clear to me whether, once downloaded from the servers, it resides then on my local disk for quick subsequent access or is ephemeral in some way. This, in turn, has impact on the question of whether one writes programs that make use of cloud data and how fast those programs will run under different circumstances. I also don't have a sense of how fast data is in fact downloaded from the cloud. An example is the request for the names of all stars as in the example that I had above. In that case there are about 100,000 stars, and therefore, presumably the data for the names of all of those stars is probably not greater than several tens of megabytes. That, under other circumstances than this, one would naively expect to download rather quickly. But the query takes a great deal of time downloading and after waiting a dozen minutes I terminated it. So, this is an additional conceptual puzzle. But of course I realize that these data do not reside on the cloud servers as individual specific files that are downloaded whole cloth.

Just some random thoughts as I begin to experiment with these functionalities....

Thanks again, David

POSTED BY: David Reiss

I think the short answer to your question is basically "No".

Filtering enties by a property is kinda hard. In order for a query to evaluate efficiently the entities would have to be stored in some data structure design for that kind of query. But to handle a generic query, what other choice is there but to go through every possible entity? The developers I've talked to are of course interested in making common/likely queries eaiser. Maybe we will see some new tools in future versions.

For some entities there are tools to make these kinds of queries easier. GeoEntities is an example of such a function. Entities which have positions of the earth are likely to be queried by their distance or position in some way, so GeoEntities provides a tool for such queries. GeoEntities doesn't do stars though. I bring it up as an example of where there is a function that allows for a more advanced query without doing an exhaustive search.

"Star" entities, like many others have EntityClasses. These are groups of stars that might be useful in some cases:


One of the groups is the nearest 100 stars for example.

For a brute force search such you are describing I would use StarData as a function instead:

Select[#, Last[#] < Quantity[10, "LightYears"] &] &@
StarData[StarData[], {"Name", "DistanceFromEarth"}]

This doesn't seem to work just yet, but CloudEvaluate might be helpful since the servers likely have quicker read to the star data:

 Select[#, Last[#] < Quantity[10, "LightYears"] &] &@
  StarData[StarData[], {"Name", "DistanceFromEarth"}]
POSTED BY: Sean Clarke
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract