Message Boards Message Boards

3
|
12670 Views
|
3 Replies
|
3 Total Likes
View groups...
Share
Share this post:

Markov chain n-gram models

Posted 11 years ago
Here are the links to two blog posts of mine discussing the application of n-gram models to 
1. text generation
http://mathematicaforprediction.wordpress.com/2014/01/25/markov-chains-n-gram-model-implementation/ ,

2. genome data classification
http://mathematicaforprediction.wordpress.com/2014/01/27/classification-of-genome-data-with-n-gram-models/ .

The second post has a discussion about using a modified Receiver Operating Characteristic (ROC) to select the best n of the n-gram models for different combinations of gene pairs.
Here is an example of ROC plots:

I plan to update this post with the Mathematica code I programmed and used. That code can be also found in https://github.com/antononcube/MathematicaForPrediction (MathematicForPrediction at GitHub). The full blown article describing the genome data classification algorithm and experiments can be downloaded from this link : https://github.com/antononcube/MathematicaForPrediction/blob/master/Documentation/Classification%20of%20genome%20data%20with%20n-gram%20models.pdf
POSTED BY: Anton Antonov
3 Replies
Posted 11 years ago
Thnaks for the articles you have been publishing. They have been very helpful and illuminating.
POSTED BY: Jon Rogers
Thank you, Jon, that is nice to hear!
POSTED BY: Anton Antonov
Here is the application of the n-gram model to text genereation using the full text of the play "Hamlet" as training data:
text = ExampleData[{"Text", "Hamlet"}];
genTexts = {#,
    NGramMarkovChainText[text, #,
     StringSplit[text][[1020 ;; 1020 + # - 1]], 200,
      WordSeparators -> {" ", "\n"}]} & /@ Range[2, 5]


 It can be seen in the table that the 5-gram generated text makes more sense than the 2-gram one. All 4 randomly generated texts start from the same place in the play.
(A more detailed discussion is given in my "Markov chains n-gram model implementation" blog post.)
POSTED BY: Anton Antonov
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract