Group Abstract Group Abstract

Message Boards Message Boards

Get an updated value for WordFrequencyData?

Hello community,

I have two questions regarding WordFrequenceData[]:

I noticed that the maximum date for this feature is 2008 (from 12 years ago), even in the new version 12.1. I understand that the data comes from the "Google Books English n-gram public dataset".

I'm still trying to understand how this command (WordFrequenceData) works, so I may be missing something. Example:

WordFrequencyData["computer", "TimeSeries", {1900, Now}]
DateListPlot[%]

im1

Now
Today
DateValue["Year"]
WordFrequencyData["computer", "TimeSeries", {1900, Today}]
WordFrequencyData["computer", "TimeSeries", {1900, DateValue["Year"]}]

im2

  • My questions are:

1) Are there any estimates when this data will be updated?

2) Is there any workaround for this? Maybe with WebSearch[] in any way?

Thank you very much.

POSTED BY: Claudio Chaib
3 Replies

There is additional functionality that allows you to analyze the raw text data As it is divided to 100 fragments you may use the following to download it as a first step and continue from there

 Table[URLDownloadSubmit[
  "http://commondatastorage.googleapis.com/books/syntactic-ngrams/eng/\
nodes." <> IntegerString[n, 10, 2] <> "-of-99.gz", 
  "~/Downloads/" <> IntegerString[n, 10, 2] <> ".gz", 
  HandlerFunctions -> <|"TaskFinished" -> Print|>], {n, 0, 98}]

best

I cannot help you further except noting that

  1. The original dataset is not newer than 2008 (the paper is from 2010)

  2. Mathematica's function WordFrequencyData considers only this data set. I suggest you read the documentation on WordFrequency WordData and related functions...

best

yehuda

Hi Yehuda,

Thank you for taking the time to answer me. I ran your code and tried a few things here, but I couldn't make any progress. Unfortunately, I have no idea how to use it for WordFrequenceData or whether to use it in another command. Sorry if I still don't understand this type of command or what to do with it... I humbly ask, could you help me a little more to show how I use the result of this code in WordFrequenceData or similar resource?

I will be extremely grateful. Thank you very much.

POSTED BY: Claudio Chaib
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard