Group Abstract Group Abstract

Message Boards Message Boards

[NB] Predicting Coronavirus Epidemic in United States

Posted 5 years ago
POSTED BY: Robert Rimmer
35 Replies
Posted 5 years ago

Robert,

I have downloaded the DPLM.ZIP from the ftp. Thanks

Alan

POSTED BY: Alan Mok
Posted 5 years ago
Attachments:
POSTED BY: Robert Rimmer
Posted 5 years ago

Robert,

I am interested in lognormal double Pareto distribution to be used for Stock trading. Can you share the associated Mathematica package.

Alan

POSTED BY: Alan Mok
Posted 5 years ago

POSTED BY: Robert Rimmer

Yes, of course, that linear behavior we currently see cannot go on forever: At the very least, once we're getting closer to 100% of the population infected things will have to asymptote.

However, at this point, even taking into account that the number of people infected may be an order of magnitude higher than what the number of known (=tested) positives imply -- currently at about 1.4M, so ten times that number makes 14M -- we currently have only a few percent of the population infected.

The picture I have in my mind, from the remark in my previous post, is one of an "infection front" spatially moving through the population at constant speed, which may produce the linear growth in case numbers we are seeing now. This may go on like this for another couple of months. On the other hand, the good news is that the behavior is linear, thus the number of new cases per day is constant, and if things remain that way they will remain manageable.

POSTED BY: Dietmar Rempfer
Posted 5 years ago
POSTED BY: Robert Rimmer

What is somewhat interesting about this fit is that it's been fairly stable for more than a month now (use the slider in the dynamic plots to see how the fit has been changing over time as more data has become available). I do wonder if we can find a plausible model that would generate this kind of distribution. I vaguely remember from many years back obtaining similar behavior when I was working on hyperbolic one-D transport-reaction equations that we had obtained for fixed-bed absorption/desorption processes.

POSTED BY: Dietmar Rempfer

For the fun of it, here is a notebook looking at a fit using an exponential startup with a linear tail:

POSTED BY: Dietmar Rempfer

I was specifically thinking of this article/post. Some interesting ideas there, but you'll have to run this on some fairly high-powered cluster to be able to study networks and parameter sets large enough to start providing guidance for the real word.

POSTED BY: Dietmar Rempfer
Posted 5 years ago

Brian,

If you use Mathematica I wrote a package for the lognormal double Pareto distribution. It was written for version 8, but I checked it this afternoon and it still works. I used it to go through every equation in that paper in a notebook. Let me know if you would like those files.

I was originally interested in it for stock market volatility measures, but they are modeled as well by the generalized extreme value distribution with one less parameter.

Bob

POSTED BY: Robert Rimmer
Posted 5 years ago

Robert, I found the paper you forwarded interesting, at first blush. I'll take the time to explore it well. Thank you! Brian W

POSTED BY: Brian Whatcott
Posted 5 years ago
Attachments:
POSTED BY: Robert Rimmer
Posted 5 years ago
POSTED BY: Brian Whatcott
Posted 5 years ago
POSTED BY: Brian Whatcott
Posted 5 years ago

POSTED BY: Robert Rimmer
POSTED BY: Dietmar Rempfer
Posted 5 years ago

About that step change in Deaths: I notice Worldometers updated US DEATHS about an hour ago - local time, actually 04:17 GMT tomorrow! Here's what that one and only decaying plot looks like now:US DEATHS - REVISED   April 21 Their usual update happens around noon. Looks like that double exponential may go back to its former linear growth....

Brian W

POSTED BY: Brian Whatcott
Posted 5 years ago
POSTED BY: Brian Whatcott
Posted 5 years ago
POSTED BY: Brian Whatcott
Posted 5 years ago

Day #48 of my data was rather like that: a step increase when NY State and several others decided to include non-hospital cases and fatalities in their totals. I excluded two prior days data - which is itself a kind of massaging of the input - an original sin of sorts.

POSTED BY: Brian Whatcott
Posted 5 years ago

Brian,

Just clicked on the link, they are now reporting 45318, deaths for 4/22. I don't know how often they update, but it eventually may be showing up there as well. Are you analyzing with Mathematica?
We are getting to an interesting point in the epidemic, which should decide the fate of the logistic model. The logistic model requires an exponential decline, which is easiest to show in the log plot of the daily differences. LogPlotDailyDifferences

In the picture the orange dots are the daily differences of the raw data and the curve is the calculated daily difference from the logistic fit to the cumulative data as a continuous curve which is very close to the first derivative curve. If the points don't follow something close to a log linear decline, I don't think the logistic model holds, but with data anomalies it may be difficult to track. I am finding that the death data is lagging the case data by 7 to 8 days. In the end the death data may be more reliable, especially if people start reporting antibody tests as positive cases.

Bob

POSTED BY: Robert Rimmer
Posted 5 years ago
POSTED BY: Robert Rimmer
Posted 5 years ago

US Data going the wrong way?Cumulative US Deaths by day

POSTED BY: Brian Whatcott
Posted 5 years ago
POSTED BY: Brian Whatcott

I just updated the notebook I posted with the latest data. This does not look good. Looks like the United States is turning the corner, the wrong way…

POSTED BY: Dietmar Rempfer
Posted 5 years ago

Agreed. Most of the epidemiologic models are focused on the dynamics of the disease and recovery rates. Effective quarantine, however, can shut down an epidemic very rapidly, and the epidemiologic models also require transmission assumptions that are not known for a new disease.

I do suspect the S. Korea finale was unusual, and that we should be able to predict the rate of decline from the data pretty soon, then modelling with the tail behavior of statistical distributions might prove useful. If the decline is faster than exponential, then the gamma distribution offers the range of tail behaviors between exponential and normal distributions, if it is slower, then a Pareto distribution could model it. If there are recurrent new small outbreaks, which might be what happened in S. Korea, then no mathematical model will succeed.

POSTED BY: Robert Rimmer

On your last remark, yes, that is the issue with trying to predict the future: It's really hard to do in advance; it's much easier to "predict" (=explain) the future once it's not the future anymore, but that's not as helpful in our current situation. This is also the difficulty with those SIR/SEIR/etc. "first-principle" models. They are really pretty, and work alright once all the parameters are known with sufficient accuracy. Unfortunately it turns out that we typically have a good grasp of those parameters only after the epidemic is over.

POSTED BY: Dietmar Rempfer
Posted 5 years ago
POSTED BY: Dietmar Rempfer
Posted 5 years ago

POSTED BY: Robert Rimmer
Posted 5 years ago
POSTED BY: Brian Whatcott
Posted 5 years ago

No, the fits are done with the cumulative case data, then the fit is used to predict what the daily rate changes would be according to the fit of the cumulative case data up to that point. Growth rate changes are what will be noticed in the news. The fits raise the interesting question that the early data may not be very good as the rates derived from the daily log differences of the reported cases do not fit any pattern. Here are the plots for New Jersey including today's data.

NJ4-14-20

Using the default fitting, which uses Norm to minimize the fit residuals, the early data are not adding much to the fit as shown by the last log plot. Either the dynamics are changing after the peak or the early data are not very good. The plots of the early daily rates show a chaotic pattern suggesting the problem may be with the early data. Using the default fitting method to the cumulative data limits the influence of the early data points, so hopefully the predictive value gets better with more data, but the data from South Korea looked a little peculiar at the end, but the late case increases were not huge even though they failed to fit the model.

POSTED BY: Robert Rimmer
Posted 5 years ago

Oh dear. This model seems to be using RATE information as its source - which leads to the usual problems with noisy inputs.

POSTED BY: Brian Whatcott

enter image description here -- you have earned Featured Contributor Badge enter image description here

Your exceptional post has been selected for our editorial column Staff Picks http://wolfr.am/StaffPicks and Your Profile is now distinguished by a Featured Contributor Badge and is displayed on the Featured Contributor Board.

This post has been listed in the main resource-hub COVID-19 thread: https://wolfr.am/coronavirus in the section Computational Publications. Please feel free to add your own comment on that discussion pointing to this post ( https://community.wolfram.com/groups/-/m/t/1906954 ) so many more interested readers will become aware of your excellent work. Thank you for your effort!

POSTED BY: EDITORIAL BOARD
Posted 5 years ago

POSTED BY: Robert Rimmer
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard