Group Abstract Group Abstract

Message Boards Message Boards

How to speed up entity queries?

Posted 8 years ago

Mathematica provides access to a huge amount of curated data. But most of this is so slow and so inconvenient to retrieve that it is literally next to useless.

Take this simple query as an example:

t = AbsoluteTime[];
Entity["Plant", "Species:GlycineMax"]["TaxonomyGraph"] // AbsoluteTiming
AbsoluteTime[] - t

This took 3.5 minutes (!!!) on my machine, despite AbsoluteTiming reporting merely 9 seconds. On a second try, after restart, it took 4.5 minutes.

This is a typical problem whenever trying to retrieve curated data. Even the 9 seconds would be much too slow for anything else than a one-time interactive query. Use in a program (loop) is out of the question.

Is there a fix for this kind of problem?

Given the great amount of effort Wolfram put into developing this functionality, why are these basic usability issues not being fixed? I am a bit puzzled because being "knowledge-based" is the main marketing point of the Wolfram Language.

Does anyone on this forum seriously use these functions? If yes, how can you manage the terrible performance?

POSTED BY: Szabolcs Horvát
4 Replies
Posted 3 years ago

This has some time now so I'm not aware if this existed when the post was made. I definitely lack the expertise of the other commentators. However as version 13.0 there is a useful function called

EntityPrefetch[] (*For example EntityPrefetch["Plant"] *)

In[1]:= t = AbsoluteTime[];
EntityPrefetch["Plant"] // AbsoluteTiming
AbsoluteTime[] - t

Out[2]= {4.69674, Success[
 "Prefetch", <|"MessageTemplate" -> "Prefetch successful.", 
   "Values" -> 26194950, "Type" -> "Plant"|>]}

Out[3]= 4.7544438

That mostly bypasses the issue. I made the test and it took no more than 6 minutes to access 4000 entries, correctly collecting data and such. Sorry I didn't document the whole process as once you download some Entity you can't time it again unless you delete the downloaded files, for which I do not know the files locations.

POSTED BY: Updating Name
Posted 8 years ago
POSTED BY: b3m2a1 ​ 

I'm also curious on what takes what amount of time? Is it setting up the connection? Is it the size (large size)? is it the interpretation? is it the database lookup?

2 minute 50 on my machine btw…

POSTED BY: Sander Huisman

Here 8.49393 in the Entity[], absolute time difference 127.83453 (2 min 7), Mathematica 10.4 on this machine, Windows 10 64 Bit Prof Update 1709. If you do the same thing again (0.0114 vs. 10.7457 - Computers are obviously intended to do it again).

But know, keep your socks on, if the notebook is closed, Mathematica exits too, then the Notebook opened again, it does again an 'Initializing Knowledge Base Connection' but returns in 4.68854 vs. 6.31252. Then the question is, what has been returned?

enter image description here

Usually the solution is to localize or cache the data needed, if that is possible. Only from time to time a check whether curators did change something should be done.

POSTED BY: Udo Krause
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard