Hi Anton, thanks for this considerable effort. It's something I've been thinking about and experimenting on lately as well. As far as validation goes, is it possible to give more rigorous assurances hitting not only the 1-gram frequency distribution but also whichever n-gram distribution is used for determining transition probabilities?
The context I'm thinking of is the Metropolis-Hastings algorithm. Obviously in this case reading text forwards is not the same as reading it backwards, so symmetric proof logic designed for "Gaussian drawn step size" does not readily apply. It seemed to me that, reading forwards only, the following naive answer would be okay:
$$A(x'|x) = P(x'|x) / g(x'|x)$$
with acceptance
$A$, proposal
$g$, and conditional
$P$ (as on wikipedia). For optimization sake, we wouldn't want to calculate reverse-conditional
$P(x|x')$ if we can get away without it. One potential issue is that, when rejecting some proposals, we can't end up with a sentence like "the cat cat cat jumped jumped over the the fence", but repeats may be necessary to achieve exact statistics.
I looked briefly on google scholar, and didn't find anything relevant. Maybe I don't have the right search terms?