[Notebook] Genome analysis and the SARS-CoV-2

Posted 1 year ago
MODERATOR NOTE: coronavirus resources & updates:

I want to show a few ways in which Mathematica can be used to do various types of analysis on gene sequences. The application will of course be to the recent novel coronavirus 2019-nCoV. But the methods are generally applicable.

Thank you for the demonstration Daniel!


I posted a notebook that uses the resource function PhylogeneticTreePlot to create a dendrogram of several hundred full sequences of SARS-CoV-2 genomes.

This uses, among other things, the Wolfram Data Repository item containing said sequences.

As another related note, an article has recently appeared that uses similar methods for genome sequence classification.

Gurjit S. Randhawa,, Maximillian P. M. Soltysiak , Hadi El Roz, Camila P. E. de Souza, Kathleen A. Hill, Lila Kari.Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study. PLoSONE 15(4):e0232391, April 24, 2020, 24 pages. doi: 10.1371/journal.pone.0232391

Here is a direct link to the paper:

I will also note here, as I do in today's Community post, that I first learned about the Chaos Game Representation from a 2016 talk given by Lila Kari at the University of Western Ontario.Clearly it was a great talk, from my perspective-- I've been using the Chaos Game representation ever since, in projects on genome identification and even authorship identification.

