Message Boards Message Boards

2
|
16215 Views
|
7 Replies
|
8 Total Likes
View groups...
Share
Share this post:

Digital Humanities, Topic Modelling, Sequence Alignment

Hi,

since recently I work in a part of the (Swedish) academic world called "digital humanities" (DH). I am interested in finding out to what extent Mathematica can be used for doing things with text in this area. It seems to me that the combination of Internet and database connectivity, manipulation of strings, algorithmic power, visualization, interactivity, and quick development could make Mathematica quite useful.

There are two methods that are considered hot in DH that I would like to try to implement. The first is called "Topic Modelling" and is based on an algorithm called Latent Dirchlet Allocation, in Blei, Ng & Jordan, 2003, Latent Dirchlet Allocation. I know some mathematics, but this is a little to complicated for me - even though I found the article very well written and readable.

The second is a method for finding similar passages in a large set of texts. An implementation for this can be found here . I hope that the built in functionality for sequence alignment can be used as a basis for an implementation in Mathematica. If so, I suppose that this could be done relatively easily.

Perhaps someone here knows about work already done in Mathematica related to these methods? Or have other suggestions?

Kind Regards, Sverker Lundin

POSTED BY: Sverker Lundin
7 Replies

I should add that if you have developed implementations of ARTFL's tools in Mathematica (or know of someone who has) I'd love to hear about it!

POSTED BY: Arno Bosse
POSTED BY: Sverker Lundin
POSTED BY: Arno Bosse

This is exactly what I needed. Thank you very much!

POSTED BY: Jeffrey Lapides

Hi Sverker:

Topic modeling is not nearly as complicated as you might think. Here is a good introduction: http://www.youtube.com/watch?v=4p9MSJy761Y.

I have used it extensively to analyze research portfolios of the US Department of Agriculture. Some of the work was done with Mathematica, some outside with Mallet. Get in touch if you would like to know more.

Jeff Lapides jrlapides2@verizon.net

POSTED BY: Jeffrey Lapides

For the second ou might consider Latent Semantic Analysis (LSA) approaches. One reference I found that shows something along these lines with Mathematica is a 2011 dissertation by Saurav Karmaker.

http://scholarworks.gsu.edu/cgi/viewcontent.cgi?article=1062&context=cs_diss

There are probably other references out there as well. Also there may be good methods that could avail themselves of the built in Nearest function.

POSTED BY: Daniel Lichtblau
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract