Message Boards Message Boards

Get an updated value for WordFrequencyData?

Hello community,

I have two questions regarding WordFrequenceData[]:

I noticed that the maximum date for this feature is 2008 (from 12 years ago), even in the new version 12.1. I understand that the data comes from the "Google Books English n-gram public dataset".

I'm still trying to understand how this command (WordFrequenceData) works, so I may be missing something. Example:

WordFrequencyData["computer", "TimeSeries", {1900, Now}]
DateListPlot[%]

im1

Now
Today
DateValue["Year"]
WordFrequencyData["computer", "TimeSeries", {1900, Today}]
WordFrequencyData["computer", "TimeSeries", {1900, DateValue["Year"]}]

im2

  • My questions are:

1) Are there any estimates when this data will be updated?

2) Is there any workaround for this? Maybe with WebSearch[] in any way?

Thank you very much.

POSTED BY: Claudio Chaib
3 Replies

I cannot help you further except noting that

  1. The original dataset is not newer than 2008 (the paper is from 2010)

  2. Mathematica's function WordFrequencyData considers only this data set. I suggest you read the documentation on WordFrequency WordData and related functions...

best

yehuda

Hi Yehuda,

Thank you for taking the time to answer me. I ran your code and tried a few things here, but I couldn't make any progress. Unfortunately, I have no idea how to use it for WordFrequenceData or whether to use it in another command. Sorry if I still don't understand this type of command or what to do with it... I humbly ask, could you help me a little more to show how I use the result of this code in WordFrequenceData or similar resource?

I will be extremely grateful. Thank you very much.

POSTED BY: Claudio Chaib

There is additional functionality that allows you to analyze the raw text data As it is divided to 100 fragments you may use the following to download it as a first step and continue from there

 Table[URLDownloadSubmit[
  "http://commondatastorage.googleapis.com/books/syntactic-ngrams/eng/\
nodes." <> IntegerString[n, 10, 2] <> "-of-99.gz", 
  "~/Downloads/" <> IntegerString[n, 10, 2] <> ".gz", 
  HandlerFunctions -> <|"TaskFinished" -> Print|>], {n, 0, 98}]

best

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract