Hi,
since recently I work in a part of the (Swedish) academic world called "digital humanities" (DH). I am interested in finding out to what extent Mathematica can be used for doing things with text in this area. It seems to me that the combination of Internet and database connectivity, manipulation of strings, algorithmic power, visualization, interactivity, and quick development could make Mathematica quite useful.
There are two methods that are considered hot in DH that I would like to try to implement. The first is called "Topic Modelling" and is based on an algorithm called Latent Dirchlet Allocation, in Blei, Ng & Jordan, 2003, Latent Dirchlet Allocation. I know some mathematics, but this is a little to complicated for me - even though I found the article very well written and readable.
The second is a method for finding similar passages in a large set of texts. An implementation for this can be found here . I hope that the built in functionality for sequence alignment can be used as a basis for an implementation in Mathematica. If so, I suppose that this could be done relatively easily.
Perhaps someone here knows about work already done in Mathematica related to these methods? Or have other suggestions?
Kind Regards, Sverker Lundin