# Markov chain n-gram models

 Here are the links to two blog posts of mine discussing the application of n-gram models to 1. text generationhttp://mathematicaforprediction.wordpress.com/2014/01/25/markov-chains-n-gram-model-implementation/ ,2. genome data classificationhttp://mathematicaforprediction.wordpress.com/2014/01/27/classification-of-genome-data-with-n-gram-models/ .The second post has a discussion about using a modified Receiver Operating Characteristic (ROC) to select the best n of the n-gram models for different combinations of gene pairs.Here is an example of ROC plots:I plan to update this post with the Mathematica code I programmed and used. That code can be also found in https://github.com/antononcube/MathematicaForPrediction (MathematicForPrediction at GitHub). The full blown article describing the genome data classification algorithm and experiments can be downloaded from this link : https://github.com/antononcube/MathematicaForPrediction/blob/master/Documentation/Classification%20of%20genome%20data%20with%20n-gram%20models.pdf
 Here is the application of the n-gram model to text genereation using the full text of the play "Hamlet" as training data:text = ExampleData[{"Text", "Hamlet"}];genTexts = {#,     NGramMarkovChainText[text, #,      StringSplit[text][[1020 ;; 1020 + # - 1]], 200,       WordSeparators -> {" ", "\n"}]} & /@ Range[2, 5] It can be seen in the table that the 5-gram generated text makes more sense than the 2-gram one. All 4 randomly generated texts start from the same place in the play.(A more detailed discussion is given in my "Markov chains n-gram model implementation" blog post.)
