Message Boards Message Boards


[Notebook] Genome analysis and the SARS-CoV-2

Posted 1 year ago
3 Replies
12 Total Likes

MODERATOR NOTE: coronavirus resources & updates:

enter image description here

I want to show a few ways in which Mathematica can be used to do various types of analysis on gene sequences. The application will of course be to the recent novel coronavirus 2019-nCoV. But the methods are generally applicable.

3 Replies

enter image description here - Congratulations! This post is now featured in our Staff Pick column as distinguished by a badge on your profile of a Featured Contributor!

Thank you for the demonstration Daniel!


I posted a notebook that uses the resource function PhylogeneticTreePlot to create a dendrogram of several hundred full sequences of SARS-CoV-2 genomes.

This uses, among other things, the Wolfram Data Repository item containing said sequences.

As another related note, an article has recently appeared that uses similar methods for genome sequence classification.

Gurjit S. Randhawa,, Maximillian P. M. Soltysiak , Hadi El Roz, Camila P. E. de Souza, Kathleen A. Hill, Lila Kari.Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study. PLoSONE 15(4):e0232391, April 24, 2020, 24 pages. doi: 10.1371/journal.pone.0232391

Here is a direct link to the paper:

I will also note here, as I do in today's Community post, that I first learned about the Chaos Game Representation from a 2016 talk given by Lila Kari at the University of Western Ontario.Clearly it was a great talk, from my perspective-- I've been using the Chaos Game representation ever since, in projects on genome identification and even authorship identification.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract