I don't know of any projects that are open source like the Stanford project, but you can get pretty far with just WordData, which is built into Mathematica, and the ideas of n-grams and Markov chains, which are easy to implement. Also ExampleData has a few texts which are immediately available. See the linguistic guide in the documentation for ideas:
Linguistic Data