# Markov chain n-gram models

Posted 10 years ago
11796 Views
|
3 Replies
|
3 Total Likes
|
 Here are the links to two blog posts of mine discussing the application of n-gram models to 1. text generationhttp://mathematicaforprediction.wordpress.com/2014/01/25/markov-chains-n-gram-model-implementation/ ,2. genome data classificationhttp://mathematicaforprediction.wordpress.com/2014/01/27/classification-of-genome-data-with-n-gram-models/ .The second post has a discussion about using a modified Receiver Operating Characteristic (ROC) to select the best n of the n-gram models for different combinations of gene pairs.Here is an example of ROC plots:I plan to update this post with the Mathematica code I programmed and used. That code can be also found in https://github.com/antononcube/MathematicaForPrediction (MathematicForPrediction at GitHub). The full blown article describing the genome data classification algorithm and experiments can be downloaded from this link : https://github.com/antononcube/MathematicaForPrediction/blob/master/Documentation/Classification%20of%20genome%20data%20with%20n-gram%20models.pdf
3 Replies
Sort By:
Posted 10 years ago
 Here is the application of the n-gram model to text genereation using the full text of the play "Hamlet" as training data:text = ExampleData[{"Text", "Hamlet"}];genTexts = {#,     NGramMarkovChainText[text, #,      StringSplit[text][[1020 ;; 1020 + # - 1]], 200,       WordSeparators -> {" ", "\n"}]} & /@ Range[2, 5] It can be seen in the table that the 5-gram generated text makes more sense than the 2-gram one. All 4 randomly generated texts start from the same place in the play.(A more detailed discussion is given in my "Markov chains n-gram model implementation" blog post.)
Posted 10 years ago
 Thank you, Jon, that is nice to hear!
Posted 10 years ago
 Thnaks for the articles you have been publishing. They have been very helpful and illuminating.