Message Boards Message Boards

6 Replies
19 Total Likes
View groups...
Share this post:

Mathematica for Natural Language Processing?

Posted 11 years ago

Has anyone used Mathematica for Natural Language Processing?

Although the WolphanAlpha and Mathematica allow Natural Language Processing I didn't found any package similar to Natural Language Toolkit ( or Stanford Natural Language Processing software:

POSTED BY: Luis Mendes
6 Replies
Posted 11 years ago
As Todd Rowland said the ideas of n-grams and Markov chains are easy to implement in Mathematica -- see .
POSTED BY: Anton Antonov
Posted 11 years ago
Thank you for the examples and tips!
POSTED BY: Luis Mendes
As Todd and Hector already mentioned, there are many built in tools that can help you with language processing. For example, you can find out how to combine Markov Chains, Graph and Networks, Statistics and even notions of Entropy to analyze the text of Alice in Wonderland and build a program that generates text with similar statistical properties. Read about it in the Wolfram Blog post Centennial of Markov Chains where you can download the complete notebook of that research. By the way, analysis in that article is similar but extended version of original paper by Andrey Markov presented for the Royal Academy of Sciences in St. Petersburg about legendary Russian poem by Pushkin called “Eugene Onegin”. Here is list of some articles on the subject worth reading due to originality and extreme popularity online:
Also note large section on the subject at the Demonstrations Project: Linguistics

IMAGES: typical networks appearing in linguistic analysis  - click to see related article

POSTED BY: Vitaliy Kaurov
As Todd points out WordData is a great place to start. There is also DictionaryLookup that is simpler but brings word lists in about 30 languages. I worked personally on implementing both functions for Mathematica and I wrote a blog post showing some of the things one can do with both of them. Feel free to visit:

WolframAlpha has also some basic capabilities for spell checking and for language detection. There are ways to access this through Mathematica and the WA API.
I have also written a few Wolfram Demonstrations that you may find useful to reuse:Some new functionality may come in the future, you should keep tuned!
POSTED BY: Hector Zenil
I don't know of any projects that are open source like the Stanford project, but you can get pretty far with just WordData, which is built into Mathematica, and the ideas of n-grams and Markov chains, which are easy to implement.  Also ExampleData has a few texts which are immediately available.  See the linguistic guide in the documentation for ideas: Linguistic Data  
POSTED BY: Todd Rowland
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract