MODERATOR NOTE: coronavirus resources & updates: https://wolfr.am/coronavirus . Wolfram notebook is attached at the end of this post.
By today, there have been more than two million confirmed cases, and the total number of cases could be significantly higher. During this time, the virus has replicated many times, and given the high mutation rates of viruses (although, interestingly, the mutation rate of coronavirus is lower than other plus strand RNA viruses) it has managed to diversify into different populations which studies are starting to discover. While there is a large number of studies and reports about the infection rate, I was curious about how many times the virus has replicated and how, in terms of virus particles, does the population dynamics of the SARS CoV-2 work.
Virus population dynamics is a complex field of study, so before I got into that I tried to find an answer to the simplest question. How many times has the virus replicated? Or, in other words, how many "ancestors" are between a currently active virus particle and the first virus which could be called "SARS CoV-2". One could imagine this question as something analogous to the question "How many generations are between me and the first Homo sapiens?". Of course, it is important to note that such an analogy has to be taken with a grain of salt, as the population dynamics of viruses and humans are very different.
So, to find the answer to my question, I did the following:
First, given the complex population dynamics of the SARS CoV-2 virus, I had to make a few assumptions in my attempts to answer these questions. Some of them are mentioned where they are relevant, and the more general assumptions are the following:
- There is one single ancestor for all SARS CoV-2 viruses. In reality, there is most likely a group of very closely related virus particles that made the jump from whichever animal they were in on to humans, and because these coronavirus undergo recombination (switching pieces of their genome with other analogous sequences), a current virus may have different ancestors for different pieces of its genome. Because recombination occurs between virus genomes inside an infected cell, this is most likely to occur between the same or similar strains of the virus, and since the first SARS CoV-2 viruses are closely related, I think it is acceptable to base the following calculations on this assumption.
- There is one replication cycle inside an infected cell. In reality virus replication modes vary from a "stamping machine" or linear model, where a template is used to produce all the virus genomes inside a cell, to a binary model, where after a strand is synthesized it also becomes a template. RNA viruses, such as the SARS CoV-2 have a replication mode more towards the linear model, while double stranded DNA viruses tend to have a binary replication mode. Because the SARS CoV-2 genome single stranded, there are two strands synthesized in a replication cycle, first a negative strand is created and then that strand is used as the template for the positive RNA strands to be used as the genome for the next generation of virus particles.
Parameters and their values:
Time to the most recent common ancestor (tMRCA): According to Lai et. al. (2020), the origin of SARS CoV-2 is around the 18th of November of 2019, and thus tMRCA is the difference between that date and today.
tMRCA = DateObject["Today"] - DateObject["18 November 2019", "Day"]
Infected generation time: The time it takes from an individual to being infected to when that individual infects another. This value is taken from a study by Park et. al (2020), where the authors make a pooled estimate from various studies. The main goal of that study is to find the basic reproduction rate, so they do not give an exact value for their estimate of generation time. From a graph they show I took the mean to be 8.5 days. Note that this parameter accounts for the time when a virus is in the environment before it infects the next individual.
infectedGenerationTime = Quantity[8.5, "Days"]
Replication cycle time: The time it takes between a virus attaching to a cell to the moment the virus particles produced inside the cell are released and attach to another cell (within the same host). Studies with different coronavirus show that instead of a lysis event typical for bacteriophages, coronavirus infected cells release virus particles for an extended period of time. Ng et. al. (2003) show that for a different strain of SARS coronavirus, within a cell culture, extracellular virus particles can be seen in a few cells 5 hours after infection, and by 24 to 30 hours after infection "crystalline arrays of extracellular virus particles were seen commonly at the cell surface". I assume that the mean time for a replication cycle is half way between 5 hours and 30 hours (which is 17.5 hours) Another assumption is that the time between the release of a virus particle and its attachment to another cell is negligible.
replicationCycleTime = Quantity[17.5, "Hours"]
Time between infections: There is an interval of time in where the virus has been released by an infected individual and has not yet infected another, and during this time the virus will not be replicating. Most studies I have read focus on either the stability of viruses on surfaces or in the air, or on reproduction rate of infections, and I have not yet found an article where the time between the release of virus particles and the infection of an individual by those particles is estimated (any suggestion is welcome). A study by Duan et. al. (2003) showed that for a different SARS coronavirus (strain CoV-P9), "The survival abilities on the surfaces of eight different materials and in water were quite comparable, revealing reduction of infectivity after 72 to 96 h exposure." While virus particles can survive a few days in the environment and remain infectious, I suppose most infections occur shortly after contact or being close to an infected individual, so I am assuming a time between infections of 8 hours.
timeBetweenInfections = Quantity[8, "Hours"]
Estimating the number of replication cycles of SARS CoV-2
After giving values to our parameters, estimating the number of replication cycles is straightforward. First we have to calculate the number of replication cycles of viruses per infection, which includes the time a virus is in the environment and therefore not replicating.
generationsPerInfection = (infectedGenerationTime -
timeBetweenInfections)/replicationCycleTime
This gives a result of 11.2 virus generations per cycle of infection. Now we can estimate the number of infection cycles that have, in average, occurred for a "lineage" of infections.
infectionCycles = tMRCA/infectedGenerationTime
To make sure this estimate is within reasonable bounds, we can use this value with the estimated range of R_{0} of 2-2.5.
Interval[{2, 2.5}]^infectionCycles
This range of ~2x10⁵ to ~1.2x10⁷ contains both the number of confirmed cases, which is ~2x10⁶ and many plausible higher estimates for the number of real cases.
virusReplicationCycles = infectionCycles*generationsPerInfection
The answer seems to be that there have been on average ~200 replication cycles between a currently active virus (for the day this was posted) and the first SARS CoV-2 virus. In this calculation there were many assumptions and estimates, and to get an idea of how precise this estimate is I also calculated the same value using the range of possible values for each parameter. Using this, I found that the number of replication cycles for the SARS CoV-2 virus could be between 41 and 1366. We can be more confident about this range over the point estimate, as this range accounts for the potential extremes of the intervals for each parameter. On the other hand, this interval was created using a mix of well-established ranges (for example, the tMRCA) and guesses, so this interval should not be taken as a confidence interval on the final value in any formal sense.
Estimating bounds on the number of replication cycles of SARS CoV-2
This section looks into the calculation by which I found the interval. The interval for the tMRCA was taken from the 95% highest posterior density interval which is 10 September 2019-28 December 2019.
tMRCAInterval =
Abs[DateObject["Today"] -
DateInterval[{DateObject["10 September 2019", "Day"],
DateObject["28 December 2019", "Day"]}]]
The interval for the infected generation time was taken from the same graph as the average, and is 7.5 to 9.7 days:
infectedGenerationTimeInterval =
Quantity[Interval[{7.5, 9.7}], "Days"]
As mentioned in the study by Ng et. al. (2003), viruses are released from an infected cell starting from 5 hours post infection, and by 30 hours most virus particles have seemed to be released from the infected cell. Using this as an interval gives:
replicationCycleTimeInterval = Quantity[Interval[{5, 30}], "Hours"]
And as surfaces with virus particles start to lose infectivity by 72 hours, we can use the interval 0-72 hours for this estimate.
timeBetweenInfectionsInterval = Quantity[Interval[{0, 72}], "Hours"]
Using these intervals to calculate the range of possible values for the number of replication cycles we have:
Interval for virus replication cycles per infection
generationsPerInfectionInterval = (infectedGenerationTimeInterval -
timeBetweenInfectionsInterval)/replicationCycleTimeInterval
Interval for the number of infection cycles since the first COVID-19 case.
infectionCyclesInterval = tMRCAInterval/infectedGenerationTimeInterval
This gives a final interval for the total number of virus replication cycles of 41 to 1366.
virusReplicationCyclesInterval =
infectionCyclesInterval*generationsPerInfectionInterval
What's next?
Knowing how many cycles of replication has occurred since the first SARS CoV-2 virus might gives us insight into how the dynamics of virus populations, but much more is needed to get a bigger picture. While writing this I was momentarily distracted by the fascinating world of virus population dynamics and evolution, and started a sketch of a demonstration of how virus populations grow incredibly fast inside a cell, then suffer a drastic population bottleneck (the size of which depends on the virus-host interaction), then to grow again in many other cells, and then another bottleneck when a fraction of the virus in a host are released to infect another. But before going into the next topic, I would like to know what the Wolfram Community thinks about this analysis. My sources were gathered by searching the web, and I could have missed an important article, so I would appreciate any suggestions for sources which would help reduce the uncertainty in this calculation.
Sources
- Duan, S. M., Zhao, X. S., Wen, R. F., Huang, J. J., Pi, G. H., Zhang, S. X., ... & Dong, X. P. (2003). Stability of SARS coronavirus in human specimens and environment and its sensitivity to heating and UV irradiation. Biomedical and environmental sciences: BES, 16(3), 246-255.
- Lai, A., Bergna, A., Acciarri, C., Galli, M., & Zehender, G. (2020). Early phylogenetic estimate of the effective reproduction number of SARS-CoV-2. Journal of medical virology.
- Ng, M. L., Tan, S. H., See, E. E., Ooi, E. E., & Ling, A. E. (2003). Proliferative growth of SARS coronavirus in Vero E6 cells. Journal of General Virology, 84(12), 3291-3303.
- Park, S. W., Bolker, B. M., Champredon, D., Earn, D. J., Li, M., Weitz, J. S., ... & Dushoff, J. (2020). Reconciling early-outbreak estimates of the basic reproductive number and its uncertainty: framework and applications to the novel coronavirus (SARS-CoV-2) outbreak. medRxiv.
Attachments: