Group Abstract

Message Boards

WOLFRAM COMMUNITY

13.2K Views

2 Replies

4 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Mathematica Wolfram|Alpha Wolfram Development Platform

How does one extract "Word Frequency History"?

Itay Livni

Posted 12 years ago

Hi - Again a warning: Beginner. I would like to accomplish two goals: Get a clear example of how to get the Word Frequency History for a particular list of words (the words come from a list) for a range of dates ? The output would be the data. I do not want the pod from Wolfram\|Alpha. Just the data for analysis. I have gone though the "People and History Page" where one is directed to the WordData function. Is it possible create a function that generates x words with a positive slope (according to frequency) and another with a negative slope... Basically a measure of relevancy? Lastly, it was not particularly clear to me (after some research) where the word frequency data comes from. The definitions I know come from wordnet. Thank you in advance.

POSTED BY: Itay Livni

2 Replies

Sort By:

Itay Livni

Posted 12 years ago

Thanks Kyle - This is a great answer although I was naively hoping to keep everything in Mathematica :) Word Frequencies in Written and Spoken English: Based on the British National Corpus. Pearson ESL, 2001 This is interesting because the Wolfram\|Alpha Data goes to 2007'sh and can be downloaded. Alternatively, you can also download Googles ngrams datasets to gain information about word frequency... The ngram datasets are something I looked at earlier, however: I could not verify the data against Wolfram\|Alpha's Some quick sanity tests did not pass (not for this forum) I was very much led astray by a crumb in the People & History Reference Guide Very helpful!

POSTED BY: Itay Livni

Kyle Keane

Kyle Keane, Massachusetts Institute of Technology (MIT)

Posted 12 years ago

The first question that you asked turns out to be more complicated than I think you expected. It appears that there is no word frequency data available through Mathematica, evaluate the following function in Mathematica to see the list of properties that WordData will give access to: `WordData[Properties]`. So, there is no simple explanation that someone can give you, below I offer a suggestion about how I would go about accomplishing this task. Here is how to figure out where the info comes from in Wolfram\|Alpha. At the bottom left of a Wolfram\|Alpha page, such as this one, there is a link called sources. If you click sources, then you will see another link called word data. If you click word data, then you will see the following citation: Leech, G., P. Rayson, and A. Wilson. Word Frequencies in Written and Spoken English: Based on the British National Corpus. Pearson ESL, 2001. Following that citation brings you to this page. which contains the datasets that are presumably used to present the information in Wolfram\|Alpha. You can download those datasets and import them into Mathematica using `Import[]`, but it seems that they are not built into the Wolfram Language at this time. Alternatively, you can also download Googles ngrams datasets to gain information about word frequency:. Once you have one of these datasets imported into Mathematica, then you can certainly write a function to search for words with increasing usage and decreasing usage. This is the more complex task of picking trends from a noisy dataset (moving average, linear regression, ), but Mathematica is great for this type of work. See the documentation on Statistical Data Analysis for a good starting point. You can also use Financial Data functions such as `DateListPlot[]` and `LinearModelFit[]` to figure out whether a word frequency is trending up or down. Hope this helps :)

POSTED BY: Kyle Keane

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback