7
|
3841 Views
|
4 Replies
|
16 Total Likes
View groups...
Share
GROUPS:

# Using Prefix trees for Markov chain text generation

Posted 8 months ago
4 Replies
Sort By:
Posted 5 months ago
 Here is the corresponding paclet:
Posted 8 months ago
 Hi Anton, thanks for this considerable effort. It's something I've been thinking about and experimenting on lately as well. As far as validation goes, is it possible to give more rigorous assurances hitting not only the 1-gram frequency distribution but also whichever n-gram distribution is used for determining transition probabilities?The context I'm thinking of is the Metropolis-Hastings algorithm. Obviously in this case reading text forwards is not the same as reading it backwards, so symmetric proof logic designed for "Gaussian drawn step size" does not readily apply. It seemed to me that, reading forwards only, the following naive answer would be okay: $$A(x'|x) = P(x'|x) / g(x'|x)$$with acceptance $A$, proposal $g$, and conditional $P$ (as on wikipedia). For optimization sake, we wouldn't want to calculate reverse-conditional $P(x|x')$ if we can get away without it. One potential issue is that, when rejecting some proposals, we can't end up with a sentence like "the cat cat cat jumped jumped over the the fence", but repeats may be necessary to achieve exact statistics. I looked briefly on google scholar, and didn't find anything relevant. Maybe I don't have the right search terms?
Posted 8 months ago
 First -- thank you for your comment! Prefix trees (tries) can be used to build both forward and backward phrases. For example, I used multiple 3-gram and 4-gram tries with frequencies in order to predict the most likely correct word in a contextual spellchecker. (Using the package "MonadicPhraseCompletion.m".) I am not sure do I interpret the validation question correctly -- ideally this answer is relevant: My tries implementation allows getting sub-tries. For example, a trie based on 4-grams: Can be queried with a 2-gram The phrases of the resulting sub-trie can be extracted together with their probabilities A top-k kind of test can be used to evaluate do these phrases overlap with a certain presumed correct set of phrases
Posted 8 months ago
 -- you have earned Featured Contributor Badge Your exceptional post has been selected for our editorial column Staff Picks http://wolfr.am/StaffPicks and Your Profile is now distinguished by a Featured Contributor Badge and is displayed on the Featured Contributor Board. Thank you!