MODERATOR NOTE: coronavirus resources & updates: https://wolfr.am/coronavirus
I don't believe that using accumulated totals in a nonlinear regression is a standard epidemiological technique. The observations are not independent because the values are accumulated and the regression model you use assumes independence of errors. Ignoring the lack of independence usually results in smaller than appropriate confidence bands.
Dear Robert Rimmer,
Thank you very much for working on this issue and sharing your insights. I would like to give you some sociological background information to hep your conjecture stated in your essay: "I suspect that the flat growth up to day 29 shows a small group of infected and exposed people successfully quarantined. Suddenly on day 30 there is a new outbreak, probably from an unknown source who wasn’t tracked with the initial group." The Korean government has successfully detected, traced, and quarantined the initial COVID19 carriers, but the government authority failed to detect flows of a Christian religious group who went to the Wuhan area for their missionary and who later attended a mega church ceremony located in Daegu city, southeast part of Korea. The COVID19 began spreading explosively from them on day 30. That is what you see on your graph.
Dear Robert,
Thanks for sharing this new estimate. I will carefully look into it. I think there is one more important consideration that may complicate the early estimate and prediction: The Korean government (i.e., CDC) has determined to test as many potential infected patients (even those who are experiencing minor flu-like symptoms) as possible, has attempted to identify the origin and source of contagion, track all of the locations and times where and when the COVID19 carriers happen to pass, and finally has made all of these information publicly available. Precisely because the CDC increases the number of tests, the number of confirmed cases has been explosively increasing. I am not sure whether the Chinese government has adopted the similar measure from the begining, so that we can safely rely on their release data. I am quite sure, though, that, in Japan, the situation is a lot different. The Japanese government has not been testing even when it is necessary and largely abandoned the patients without giving any proper medical treatment. I am personally afraid that the similar stupid thing would happen in the U.S. I guess, my point is that, the number of confirmed cases is not that reliable yet or in some cases largely underestimated in some countries. Perhaps, it is too early to run any model?
Thank you Robert. I live in Milan, so I'm kind of directly involved in this outbreak, trying to figure out the numbers on my own, but without your competence. I find this very helpful.
Stefano Bertolucci
So far the epidemic has not ended anywhere. This is a problem of population and resources to support the population. Humans are the substrate for the virus population to propagate. When a person becomes infected, if he can be isolated from the rest of the population then the virus cannot spread. Unfortunately there is a period of time (incubation period) that a person is infected, where he is not symptomatic, and can spread the virus, so standard epidemiologic technique is to identify the contacts and try to isolate them as well. If quarantine measures succeed then the curve will flatten. If the measures don't succeed then potentially a large population is at risk. So far the numbers from China, if they can be trusted, seem to be showing deceleration of growth, and over the past week the fit to the curve from the Chinese data continue to show the deceleration and limiting population in the 80,000 to 90,000 range. Note that the logistic model always has positive growth rate until the limit is reached at infinity. In real life the cases are in integers, so if the last infected person is isolated and does not spread the infection, it stops. Unfortunately there are other possibilities that some people or animals may become carriers, in which case a lower level of infection may continue indefinitely. Hopefully most people who survive will also develop immunity so that they won't be re-infected.
The model could also be wrong, but so far all epidemics have either ended or continue at low levels that do not threaten the whole population.
-- you have earned Featured Contributor Badge
Your exceptional post has been selected for our editorial column Staff Picks http://wolfr.am/StaffPicks and Your Profile is now distinguished by a Featured Contributor Badge and is displayed on the Featured Contributor Board. Thank you, keep it coming, and consider contributing to the The Notebook Archive!
Why do you assume a logistic growth curve that flattens out? None of the data points presented are in the area where the estimated curves flatten out.
Thanks! That makes much more sense. (And I do realize that despite the serial correlation, the parameter estimates from NonlinearModelFit can be not too bad; just their standard errors are more likely to be underestimated when ignoring the serial correlation.)
NonlinearModelFit
Jim
Each country's response will be different, but I expect the data from democracies should be more transparent. The interesting thing about cumulative data is that to some extent it will be self-correcting. Cases discovered with a lag will eventually be added, so as the epidemic proceeds the early data points can be discarded to get a more reliable fit. Here is a fit based on the differential equation that may help with your tracking:
Clear[logisticDEFit]; logisticDEFit[data_, graphic_: True] := Module[{dataFn, dataDE, dataDEPlot, deModel, deNLM}, dataFn = Interpolation[data]; dataDE = {dataFn[#], dataFn'[#]} & /@ data[[All, 1]]; dataDEPlot = ListPlot[dataDE, PlotStyle -> Darker@Red, GridLines -> Automatic, PlotRange -> All]; deModel = k x (1 - x/L); deNLM = NonlinearModelFit[dataDE, deModel, {k, L}, x]; If[graphic, Print@Show[ Plot[deNLM[x], {x, 0, L /. deNLM@"BestFitParameters"}, GridLines -> Automatic, Frame -> True, FrameLabel -> {"Cases", "LogisticFunction'[t]"}, PlotRange -> All, PlotLabel -> "Logistic Differential Equation"], dataDEPlot]]; Flatten@{deNLM@"BestFitParameters", FindRoot[ dataFn[t0] == L/2 /. deNLM["BestFitParameters"], {t0, data[[-1]][[1]]}]} ];