Dear All,
here is a little analysis of the main words in the bible (because that was an example mentioned in the original post):
bibleTxt = Import["http://www.gutenberg.org/cache/epub/10/pg10.txt"];
WordCloud[DeleteStopwords[TextWords[bibleTxt]]]
Regarding the comment above about the problems with pdf files: I am working on a project where I need to analyse millions of pdf files scanned all over the world. There are in fact many of those which Mathematica cannot open. I found that converting them on a command line to ps and then back to pdf works usually very well. I usually work on Linux based machines, where this is no problem. But this also works on Windows if you use for example cygwin. Using that procedure the pdf problems virtually vanish.
Once the pdfs are fixed there is no problem. For example on this page the author makes -with permission of the AMS - a pdf file of a good book on ODEs available for personal use. Exactly the same code as above works for the analysis of this pdf file:
odestxt = Import["http://www.mat.univie.ac.at/~gerald/ftp/book-ode/ode.pdf", "Plaintext"];
WordCloud[DeleteStopwords[TextWords[odestxt]]]
Of course you might want to do some additional preprocessing of the textiles, but I think that this shows how extremely well Mathematica copes with different file formats.
Cheers,
Marco
PS: I strongly recommend downloading the pdf of the ODE book for personal use. As I said it is a good book.
PPS: Note that the full command to download the bible analyse the text and make the word cloud easily fits into a tweet:
WordCloud[DeleteStopwords[TextWords[Import["http://www.gutenberg.org/cache/epub/10/pg10.txt"]]],IgnoreCase->True]
has 112 characters or so you could tweet it to Wolfram's tweet-a-program section.