Group Abstract Group Abstract

Message Boards Message Boards

[NB] Predicting Coronavirus Epidemic in United States

Posted 5 years ago

MODERATOR NOTE: coronavirus resources & updates: https://wolfr.am/coronavirus


POSTED BY: Robert Rimmer
35 Replies
Posted 5 years ago

Robert,

I have downloaded the DPLM.ZIP from the ftp. Thanks

Alan

POSTED BY: Alan Mok
Posted 5 years ago

Robert,

I am interested in lognormal double Pareto distribution to be used for Stock trading. Can you share the associated Mathematica package.

Alan

POSTED BY: Alan Mok
Posted 5 years ago

Alan,

The zip file contains the package. If you use Install... in the file menu to load it, some limited documentation will be available. The file ReedPaper.nb uses the package to check all the math in the Reed paper noted above. DPLNTest.nb shows some of the basic functions, and calculates LMoments.

I couldn't get the zip file to load as an attachment, so I put it here:

Let me know when you have it and I will remove it from the link.

It was written nine years ago for version 8. I did test it with the two notebooks above and it still seems to work. Let me know if you have any questions

Attachments:
POSTED BY: Robert Rimmer
Posted 5 years ago
POSTED BY: Robert Rimmer
Posted 5 years ago

There is another interesting distribution with two power tails, the double Pareto lognormal distribution. I have it implemented in Mathematica if you would like to fool around with it. The paper describing it is attached.

Attachments:
POSTED BY: Robert Rimmer
Posted 5 years ago

Robert, I found the paper you forwarded interesting, at first blush. I'll take the time to explore it well. Thank you! Brian W

POSTED BY: Brian Whatcott
Posted 5 years ago

POSTED BY: Robert Rimmer
Posted 5 years ago
POSTED BY: Brian Whatcott

I was specifically thinking of this article/post. Some interesting ideas there, but you'll have to run this on some fairly high-powered cluster to be able to study networks and parameter sets large enough to start providing guidance for the real word.

POSTED BY: Dietmar Rempfer

For the fun of it, here is a notebook looking at a fit using an exponential startup with a linear tail:

POSTED BY: Dietmar Rempfer

What is somewhat interesting about this fit is that it's been fairly stable for more than a month now (use the slider in the dynamic plots to see how the fit has been changing over time as more data has become available). I do wonder if we can find a plausible model that would generate this kind of distribution. I vaguely remember from many years back obtaining similar behavior when I was working on hyperbolic one-D transport-reaction equations that we had obtained for fixed-bed absorption/desorption processes.

POSTED BY: Dietmar Rempfer
Posted 5 years ago

Clearly the data are showing a different structure late in the epidemic than was seen in the early data. I was looking yesterday at Massachusetts' website where they have the case rates per 100000 for each town. The case rates vary wildly with very high rates for low income areas where there are large immigrant populations and English isn't spoken in many homes. Today MA just made its raw data available. MA Data I have just started to look at it, but it is detailed enough that it might provide some insight into how the data unfolded, I am thinking right now that since most of the control measures required dissemination of information, different communities processed the information at different rates, with the low income areas lagging very badly in some states. If that is true we may have a mixture of models, but we may yet see an exponential tail if quarantine measures eventually succeed.

I lived in MA for forty years so I am familiar with the communities. I now live in AZ where the counties are very large, but the two counties which are mostly Indian Reservation have per capita case rates seven times higher than the rest of the state. NY and NJ maybe just starting to approach the tail behavior, I think it will be exponential, but rates are still varying daily when I fit the last ten days of new case results.

POSTED BY: Robert Rimmer

Yes, of course, that linear behavior we currently see cannot go on forever: At the very least, once we're getting closer to 100% of the population infected things will have to asymptote.

However, at this point, even taking into account that the number of people infected may be an order of magnitude higher than what the number of known (=tested) positives imply -- currently at about 1.4M, so ten times that number makes 14M -- we currently have only a few percent of the population infected.

The picture I have in my mind, from the remark in my previous post, is one of an "infection front" spatially moving through the population at constant speed, which may produce the linear growth in case numbers we are seeing now. This may go on like this for another couple of months. On the other hand, the good news is that the behavior is linear, thus the number of new cases per day is constant, and if things remain that way they will remain manageable.

POSTED BY: Dietmar Rempfer
Posted 5 years ago

POSTED BY: Robert Rimmer

As far as critiquing models is concerned, clearly, capturing a complex spatially distributed process in a simple localized model is only ever going to produce a rough approximation. Those logistic fits, specifically, are motivated by a description of epidemics on the basis of a localized model (we used to call these "lumped parameter models") which result in a fairly simple (set of) ordinary differential equation(s), and, in their simplest case, the logistic function as a solution. All of these, very much including those SIR/SEIR/… models, suffer from the assumption of spatially uniform parameters.

In contrast, in particular in the US right now, we have widely varying population densities, chaotic testing, and wildly varying levels of compliance with social distancing rules. While I would therefore expect individual hot spots to be more easily described with localized models, that's near-impossible for the US as a whole.

I wish people would spend more effort on much more realistic approaches such as Wolfram's agent-based networks. Yes, those are enormously computation-intensive (and data-hungry), but models of that nature could at least clarify some crucial questions that are poorly understood at this point. Take just the question of the efficacy of social distancing, which is clearly hugely important: Current approaches take an enormous economic toll, and at some point the question needs to be asked (well, it's being asked right now) if and how these rules should be relaxed. Yet, to be perfectly frank, there is basically no real understanding, none at all, of what the effects of those social distancing rules w.r.t. cases and deaths could be. Sure, people babble about how "social distancing works", and it probably does, to some degree, but there is no quantitative understanding. Agent-based networks could provide some insight, but that will require a massive effort. Are there any groups who are doing such simulations?

POSTED BY: Dietmar Rempfer
Posted 5 years ago

Dietmar, forgive me for pigeon-holing this response as an "Argumentum Ad Vericundiam"; specifically, the physics teacher approach. This joins "aerodynamic lift as impulse/momentum of air molecules with due attention to inhomogeneities from micro gusts" and "aerodynamic drag as a function of velocity" [in this case because the calculus offers reasonable solutions, where "drag as a function of velocity squared" needs one to fall back on iterative methods for decent sizes and shapes.] Which is a slightly derogatory way of suggesting that there are emergent properties in building blocks where chaotic conditions are subsumed. No insult intended....

Brian W

POSTED BY: Brian Whatcott
Posted 5 years ago

Dietmar, I believe you hit several nails on the head with your critique of logistical models. They necessarily provide the same time constant for the asymptotic approach as for the exponential increase. Moreover, when the daily growth rate becomes linear, the differential of the logistic function puts the middle of the straight line growth as the peak of the daily rate. I am pleased by an improved model function that David Bowman at Georgetown proposed. At the cost of a fourth, fitting parameter, the exponential growth time constant is uncoupled from the time constant of the asymptotic approach. DEATHS = k1 / (1 + aexp(ab*(starting day number - present day number)))^(1/c) It too suffers - this time from its virtue: the trailing exponential can easily follow a short term hollow - so that I find it advantageous to train the non-linear fitting with a fictitious value for a datum several days in the future, which best matches the recent early data points of the slowing daily numbers - so that the fit does not hunt unnecessarily to find the least squares fit of the linear data. Sadly the only data plot where this is presently useful is US Deaths, which appear to be turning back: other plots I maintain are currently best served by straight line models, which provide a direct value for daily rates.

Brian W

POSTED BY: Brian Whatcott
Posted 5 years ago

Day #48 of my data was rather like that: a step increase when NY State and several others decided to include non-hospital cases and fatalities in their totals. I excluded two prior days data - which is itself a kind of massaging of the input - an original sin of sorts.

POSTED BY: Brian Whatcott
Posted 5 years ago

POSTED BY: Robert Rimmer
Posted 5 years ago

About that step change in Deaths: I notice Worldometers updated US DEATHS about an hour ago - local time, actually 04:17 GMT tomorrow! Here's what that one and only decaying plot looks like now:US DEATHS - REVISED   April 21 Their usual update happens around noon. Looks like that double exponential may go back to its former linear growth....

Brian W

POSTED BY: Brian Whatcott

I just updated the notebook I posted with the latest data. This does not look good. Looks like the United States is turning the corner, the wrong way…

POSTED BY: Dietmar Rempfer
Posted 5 years ago
POSTED BY: Brian Whatcott
Posted 5 years ago

Agreed. Most of the epidemiologic models are focused on the dynamics of the disease and recovery rates. Effective quarantine, however, can shut down an epidemic very rapidly, and the epidemiologic models also require transmission assumptions that are not known for a new disease.

I do suspect the S. Korea finale was unusual, and that we should be able to predict the rate of decline from the data pretty soon, then modelling with the tail behavior of statistical distributions might prove useful. If the decline is faster than exponential, then the gamma distribution offers the range of tail behaviors between exponential and normal distributions, if it is slower, then a Pareto distribution could model it. If there are recurrent new small outbreaks, which might be what happened in S. Korea, then no mathematical model will succeed.

POSTED BY: Robert Rimmer

On your last remark, yes, that is the issue with trying to predict the future: It's really hard to do in advance; it's much easier to "predict" (=explain) the future once it's not the future anymore, but that's not as helpful in our current situation. This is also the difficulty with those SIR/SEIR/etc. "first-principle" models. They are really pretty, and work alright once all the parameters are known with sufficient accuracy. Unfortunately it turns out that we typically have a good grasp of those parameters only after the epidemic is over.

POSTED BY: Dietmar Rempfer
Posted 5 years ago

Dietmar,

I am not sure what to make of the dynamic the moving average introduces, but mid March most of the datasets show a jump in cases for one day, but that wouldn't affect deaths. I think probably the cyclic phenomenon is introduced by using a linear moving average on data that is more or less growing exponentially. I think if you want to do smoothing that is consistent with the data, you should try an ExponentialMovingAverage, I have also looked at the parameters as the data accumulates. The L parameter in general keeps growing, while k decreases. My best answer is that the early data doesn't have any particular shape to it, rather it is chaotic -- you could fit most anything to it. There is probably good reason for this. Public health officials were playing catch up; the spread of the virus was much wider than was realized. So initially there was a mix of unrestrained viral spread and minimal quarantine. So we are seeing a situation that doesn't fit any model. Eventually the quarantine effort becomes more effective, but it may not become perfect and behave as a leaky quarantine. In the end the South Korea data, showed a persistent growth not at all consistent with the logistic model. Notebook attached.

Another interesting way to look at models is that the logistic model is the same as multiplying L times any unimodal statistical distribution. I am also attaching a notebook showing what fitting with that idea shows. Basically if we want to predict when it will end, The tail behavior of the distribution is what is important, and it is the right tail that will be important, because it doesn't really matter if timing screwed up the initial data accumulation. The ultimate conclusion I think is that you cannot make predictions unless you know in advance the shape of the entire curve. If the tail behavior is exponential, then the logistic distribution should eventually succeed, but it wasn't in S. Korea.

One thing that probably is predictable is that once there is a clear decline in case rates, it should eventually end, and we may be able to detect the shape of the decline. Right now we are probably still too close to the peak in the U.S.

The attached notebooks are not annotated, but you clearly will be able to understand them.

POSTED BY: Robert Rimmer

Robert, please take a look at the notebook below, which generously copied from yours, analysing both positives and number of deaths. Two new ideas are:

  • Introduction of a seven-day moving average, which seem to reveal some interesting behavior, and
  • The addition of a manipulate environment that allows one to easily see how the projections change with the availability of new data. I wonder if someone has any thoughts on the somewhat unexpected behavior of the data.

POSTED BY: Dietmar Rempfer
Posted 5 years ago

Dietmar, I believe you hit several nails on the head with your critique of logistical models. They necessarily provide the same time constant for the asymptotic approach as for the exponential increase. Moreover, when the daily growth rate becomes linear, the differential of the logistic function puts the middle of the straight line growth as the peak of the daily rate. I am pleased by an improved model function that David Bowman at Georgetown proposed. At the cost of a fourth, fitting parameter, the exponential growth time constant is uncoupled from the time constant of the asymptotic approach. DEATHS = k1 / (1 + aexp(ab*(starting day number - present day number)))^(1/c) It too suffers - this time from its virtue: the trailing exponential can easily follow a short term hollow - so that I find it advantageous to train the non-linear fitting with a fictitious value for a datum several days in the future, which best matches the recent early data points of the slowing daily numbers - so that the fit does not hunt unnecessarily to find the least squares fit of the linear data. Sadly the only data plot where this is presently useful is US Deaths, which appear to be turning back: other plots I maintain are currently best served by straight line models, which provide a direct value for daily rates.

Brian W

POSTED BY: Brian Whatcott
Posted 5 years ago

POSTED BY: Robert Rimmer
Posted 5 years ago
POSTED BY: Brian Whatcott
Posted 5 years ago
POSTED BY: Robert Rimmer
Posted 5 years ago
POSTED BY: Brian Whatcott
Posted 5 years ago
POSTED BY: Robert Rimmer
Posted 5 years ago

Oh dear. This model seems to be using RATE information as its source - which leads to the usual problems with noisy inputs.

POSTED BY: Brian Whatcott

enter image description here -- you have earned Featured Contributor Badge enter image description here

Your exceptional post has been selected for our editorial column Staff Picks http://wolfr.am/StaffPicks and Your Profile is now distinguished by a Featured Contributor Badge and is displayed on the Featured Contributor Board.

This post has been listed in the main resource-hub COVID-19 thread: https://wolfr.am/coronavirus in the section Computational Publications. Please feel free to add your own comment on that discussion pointing to this post ( https://community.wolfram.com/groups/-/m/t/1906954 ) so many more interested readers will become aware of your excellent work. Thank you for your effort!

POSTED BY: EDITORIAL BOARD
Posted 5 years ago

POSTED BY: Robert Rimmer
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard