Message Boards Message Boards

3 Replies
2 Total Likes
View groups...
Share this post:

How good is Mathematica with Natural Language processing?

I've used NLTK a little bit, and I normally use Python to extract linguistic data. This is only my second day using Mathematica, and I'm wondering whether the following things are actually possible:

  1. I know there are thousands of languages included in M. English, it seems, offers a wide range of tools for linguistic analysis. How about other languages? Let's say I'd want to separate the syllables of Spanish... would that be doable here?

  2. What would you suggest I read as far as language processing goes...? Do people normally use M for that? Are there packages for that?


3 Replies

In the last eight years I have used Mathematica quite a lot for doing Natural Language Processing and text mining. Here are couple of links that describe such activities:

[1] "Statistical thesaurus from NPR podcasts" :

[2] "Natural language processing with functional parsers"

Both blog posts have links to Mathematica packages and guides for doing NLP.

The approaches in those links are more-or-less language agnostic. I have used them to make search engines that combine (i) English, Spanish, and French, and (ii) English and Malay.

You might find this discussion interesting, "Convergence of synonym networks" : .

As for your question: "Let's say I'd want to separate the syllables of Spanish... would that be doable here?" I have not used separate syllables of Spanish, only appropriate stemmers.

POSTED BY: Anton Antonov

Thanks a lot, Anton. I'll take a look.

You'll find the Mathematica/Wolfram Language and NLTK/Python to be very different. NLTK is a colection of methods that'd you'd likely read about in a college course on natural langauge processing. The Wolfram Language doesn't (at least yet) have built in functions for most of these, but in many cases it's easy to roll your own if you understand the underlying algorithm and can program in the Wolfram Language.

Mathematica 10 has a lot of functionality that is useful for very practical NLP/Semantic reasoning. It's full of things that were used to build Wolfram|Alpha. Take for example, Interpreter: Hopefully we'll get to release some even more useful tools based on this stuff in future releases.

POSTED BY: Sean Clarke
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract