Group Abstract Group Abstract

Message Boards Message Boards

Perform queries remotely to reduce data size?

Posted 8 years ago

Hello,

I'd like to produce a list of the most (un)common words in French. There is WordList and for a list of words, I can get the frequency for each word in a list of words via WordFrequencyData. But it is highly inefficient to get the total list of words to my computer, then get their frequency only to sort them on frequency and through away almost all the data except the top/bottom ten items.

Indeed, a naive approach times out every time:

words = WordList["CommonWords", Language -> "French"]
wordFreq = Take[WordFrequencyData[words, Language -> "French"], 10]
... WordFrequencyData::timeout: A network operation for WordFrequencyData timed out. Please try again later.

Dividing the queries into smaller partitions,

wordsPartitions=Partition[words,100];
Join[Map[WordFrequencyData[#,Language->"French"]&,wordsPartitions]]

takes many hours and with many failures along the way, so retry logic is also necessary to add. Hence, back to the title, is there a better way? If this was a remote SQL database, I would be able to select words, order by frequency and set limit to 10. Not much data would have to travel over the API. (Btw it's not even much data in the first place, it's incredibly slow for a commercial product.)

POSTED BY: Daniel Janzon
5 Replies
Posted 7 years ago

Hi Sjoerd,

Thanks for the lead. I will need to digest what you have provided, but it is incredibly helpful to have something that works and works efficiently.

Thanks. Scott

POSTED BY: Scott Stiffler
Posted 7 years ago

Daniel,

Thanks for your reply. The Wolfram Language does some very clever things so easily, that I was shocked that it was so clumsy and slow with what I thought would be an elementary exercise.

I am not quite ready to give up on Wolfram yet. I have been tinkering around with logic to break the query down and recover from failures, but I haven't gotten there yet.

I see that someone has posted their solution, so I plan to dig into that.

Hopefully, I will "get" the style of this language and have some fun with it.

Thanks again. Scott

POSTED BY: Scott Stiffler
POSTED BY: Sjoerd Smit
Posted 7 years ago

Hello Scott, No I was not able to solve it. I added some retry logic to recover errors and as far as I can remember it worked in the end to get the desired list of most un/common words, but I had to let it run over night. Do let me know if you find out something new. Meanwhile I have given up on Mathematica, I can't use it for what I bought it for. If data is not missing it is too slow.

POSTED BY: Daniel Janzon
Posted 7 years ago

Hi Daniel,

Did you ever come up with a satisfactory solution to your problem? I am struggling with the same issue.

I have read Wolfram's introductory book on the language, worked through a lot of online tutorials, and made web searches, but I found myself completely stuck.

Please let me know.

Thanks. Scott

POSTED BY: Scott Stiffler
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard