Message Boards Message Boards


An SEIR like model that fits the coronavirus infection data

Posted 3 months ago
52 Replies
28 Total Likes

MODERATOR NOTE: coronavirus resources & updates:

May 20: I have updated the table of contents ("INCLUDED IN THIS POST" below)

NEW, May 10: Daily new cases forecast trends obtained from optimally fitted "TRUE" models (to learn about this, read the next paragraph and go to the response posted today at the bottom of the post ... it will be up in a few minutes). A notebook will be provided with some guidance as to how to obtain optimal fits almost automatically when I have time to finish putting it together. WARNING: There is a lot of details involved in what I am doing which gives rise to mistakes, especially when something new comes around. I just corrected some blatant ones. Hopefully they dwindle out to zero with time.

On April 21, 26, and a importantly on May 3 and 4, I have tried to improve and restructure this presentation and make a note of various conceptual issues. I have also added "TRUE" (or truer) models of the outbreak (where the number of cases is matched to the R curve - read ahead). I realize that as it was before today (and maybe still), it was poorly and hastily written. I will continue to make improvements as I have time. If nothing else, please read this paragraph. Hopefully it is easier to read now and clearer. I specify where to find material in various sections, now close to the beginning and not in too many places. NOTE and DISCLAIMER: we are using a formalism to model data ... this is not exactly the same as having a model of the outbreak itself ... only an approximation that allows us to have some understanding the dynamics of the outbreak and make some forecasts with reasonable accuracy. Modulo the explanations of compartmental models ahead and in the linked post, for the purposes of modeling the outbreak, the only data we have is the number of detected cases. This corresponds to the R curve in the outbreak: each detected infection is an individual that de facto gets quarantined and removed from the infective process. We have no other data available. We don't have data that tells us when an individual becomes exposed or infectious. However, we are using a compartmental model differently: we are considering the R curve to be those individuals that have recovered from the infection or died. And the I curve as the number of detected cases minus those that have recovered and died. The point is, these definitions of our compartments are sound in the sense that they are disjoint (that is, true compartments); and the dynamics of how individuals move from one of our so defined compartment to another can be described with the equations of the SEIR and SIR models. And so, we have a model of the data which we are able to collect. To make this clear, we present in this section, along with other content outlined below, two models for the outbreak in Italy (for which the data is very good), a "TRUE" model of the outbreak, in which the data is matched to the R curve (removed individuals), and OUR VERSION of the model of the data as we have endeavored to look at it for the most part in this post. Without further ado:


For an explanation of SEIR (and SIR) models see Robert Nachbar's post:

The equations of the slightly modified SEIR model are given ahead. It is possible to model the data by assuming a low enough susceptibility. I explain in the discussion with Robert Nachbar below why the main effect of containment measures is to lower the effective number of susceptible individuals when it is imposed (relatively speaking, at the beginning). Using this idea, it is also possible to model what could happen if you lift restrictions too early by letting the susceptibility increase (see picture) and you then reintroduced them (this is not a forecast, just a possible scenario). As a CAVEAT, these models are models using the DETECTED number of cases, not the TRUE number of cases, AT THE MOMENT THEY ARE DETECTED, not at the moment they are exposed or become infectious. Also, our compartments do not correspond to the compartments of a "true" model of the outbreak. We are taking the I compartment to be the number of detected cases minus the number of recovered and fatal cases, the sum of which is the R compartment. Our ASSUMPTION is that we can model the I compartment moving to the R compartment as individuals moving from being infected to being recovered or dead ... so intuitively we are tacitly assuming that our model gives us a picture of the outbreak with a delay, reflected in the data as it becomes available.

Regardless of these considerations, our model allows us to understand how the disease evolves in time as it pertains to the data we have at hand. At least, we were able, in the Chinese model, to predict an end of outbreak time well in advance (the evidence is in Rimmer's response to this post where a similar prediction is made based on our model).


1) SEIR models for data from China, Italy, US (alternating with a "TRUE" SIR model), a SIR model for Finland, and a "TRUE" SIR model of the outbreak in Italy (see opening remarks); the model for Spain which shows what seems to be happening as a result of lifting some stringent restrictions on a given date is now only in the Europe section (2)

2) In a response below, models for Spain, UK, France, Germany, and Austria,

3) in a separate response, three models for Finland (SEIR/SIR) as well as a model as Norway, and a document of daily tallies per million inhabitants for several countries.

4) in a separate response, a brief discussion of SIR models (with models for China, Italy, and two more models for Finland) in a separate response. Also there, a document of cases/tests ratios for various countries in the SIR models section. This section includes a picture of positivity rates for several countries.

5) in a separate response, a notebook in a response towards the end of the post. This section includes a pdf document with dialy new cases for many countries

6) Daily new cases forecast trends obtained from optimally fitted "TRUE" models in the latest response (May 10). Read more in the new section (up shortly). Notebook will be provided.

SIR MODELS (see SIR section for equations)

At the end of the post in a new response I discuss a simpler SIR-like model for the Chinese, Italian, and Finnish data which is practically as good - the equations are there. It has two advantages over the SEIR model: a) the classic SIR model has analytic solutions, so straightforward (somewhat) computational optimization can be carried out to estimate the parameters - although our equations are not the classical ones as they have a delay; b) it yields for the data we are trying to model values of R0 that are congruent to the observed ones, 5.43 for China - compared to 5.7 obtained in the just published study led by Steven Sanche and Lin Yeng-Ting, Los Alamos N. L. (arXiv:2002.03268) in Emerging Infectious Diseases, V26, Num 7 - (a note about this for the SEIR model below) without further ado (more on this ahead); and c) (UPDATE 3) if we look at the susceptibility curves (S in the diagrams), we see that they do not necessarily reach 0. If they are asymptotic to a positive value, that means there is a herd immunity effect - the value of the asymptote being the number of people who remain susceptible under containment that will not get infected; moreover, we can see that the susceptibility curve is very close to its asymptote near the peak of the infections curve (I in the diagrams), so that targeted testing is warranted as an effective measure of containment at that stage. That section contains a model of China (final) and a model for Italy (that is not updated), and a two models for Finland, one using the JHU data instead of the Finnish authorities data, and another one using another recovery schedule using an estimate based on the scant recovery data for Finland.

A SIR model for Finland and the SEIR model alternate every so often here; the SIR model uses THL (Finnish health authorities) data. Both models are available in the Finland section, and the model for Germany in its section is, alternatingly, either a SEIR or a SIR model. We have removed, in the Finland, an SIR model that shows what it looks like to reach a plateau or steady state, rather than a peak; the equations for this are necessarily different than the simple SIR equations given in the SIR section, In the SIR section there are also other models for Finland, one using JHU and the other using a different recovery schedule.

The newer models are adjusted quite frequently, especially with respect to the number of susceptible individuals, as they continue to grow. They tend to stabilize about three weeks after control measures have been in place. After the I curve peaks, it is possible to begin to get an idea of how long the outbreak will last.


Now the equations (for the SIR model, see the SIR section at the end of the post and after most of the discussions).

s'(t) = -Beta * s(t) * i(t) / p,

e'(t) = Beta * s(t) * i(t) / p - Sigma * e(t),

i'(t) = Sigma * e(t - m) - Gamma * i(t - n),

r'(t) = Gamma * i(t - n)

The function s(t) is the number of susceptible people (the people that can get exposed to the pathogen) at time t. e(t) is the number of people that have been exposed to the pathogen and can become infected; i(t) is the number of people who are infected; r(t) is the number of people who have become resistant to the pathogen: they have recovered and developed immunity or died. Now the parameters.

beta is usually considered to be the rate of infection or "force of infection"; sigma is the usually the rate at which an exposed individual becomes infective; gamma is usually the removal rate. We introduced m and n, shift or delay parameters to line up the model curves with the data.

Here, we are operating with a delay. In our model, an individual is in the I compartment when it gets detected (a case of infection) and we continue to consider it infective until it gets "removed" when it has recovered or passed (not when it gets caught). In reality (in a true model of the outbreak, as in the second example for Italy in the pictures), individuals become infective before they get caught, and they get removed when they get detected. If we assume some kind of uniform delay in the process, we can try to fit the model to the data as we have compartmentalized it (cases and recovered+deceased). Thus we get a description of the dynamics of the outbreak as described by the data we can collect. IN ANY CASE, OUR MODELS ARE MODELS OF THE DATA ... the SEIR (SIR) formalism works well, and they have predictive value. The parameter values are in the titles of the pictures for each country. In general we assume e(0)=i(0)=1 unless stated otherwise in the model label. Also, s(0)=p, and r(0)=0.

R0 in our SEIR-like and SIR-like MODELS:

In the SEIR models, the basic reproduction number (R0) is constant and it depends on the parameters of the equations below. If we do the usual calculation (roughly beta/gamma in the equations below), R0 in our models is about an order of magnitude larger than the estimated-observed R0. There is an intuitive explanation for that. If we were to model the DETECTED number of cases using the BELIEVED or TRUE number of susceptible individuals, thought to be an order of magnitude higher than the detected ones, then we would need to scale down beta by an order of magnitude to get our results, among other things. That would give us the R0 that is being measured (my understanding is that R0 was estimated on DETECTED number of cases - but if this is wrong, then my explanation for the disparity is not correct). The main effect of lockdown is to lower the number of people that can be exposed to the pathogen when it is imposed, roughly at the beginning of the outbreak (see reply to Rober Nachbar's response for a thought experiment that explains this). Recall, the basic reproduction number (R0) is constant.

The R0 numbers obtained in the SIR models discussed in a separate section are congruent with the values that are proposed in the research litereature (more about that in the SIR section).


Some countries do not provide any or most data pertaining to recoveries. We have estimated this data, sometimes extrapolating from available data, sometimes using an estimating function based on average rates from countries that do provide the data, etc. It would take too long to discuss what we have done in each case where recovery data seems to be missing or partial. We explain the Finnish case.

The THL (Finnish health authorities) data for the Finnish model comes from the Finnish Department of Health and Welfare (THL acronym in Finnish). There is a delay in the release of the data of 1-2 days. however, the recovery data comes from Johns Hopkins University and occasional reports from the Finnish authorities. According to the medical chief of staff of the infections diseases clinic at the Helsinki and Uusimaa hospital district, it was "important to define what people mean when they talk about recovery", and that "eventually it would be important to compile statistics to better understand the disease" and "was taking the numbers with a grain of salt" noting that "the criteria undrelying the data are not always clear and they are not always the same in each country". He also said that "tracking recovered patients was not a top priority". (quotes source is Yle news, the state run news agency). We have serialized the occasional recovery data according to how cases might have arisen in time to obtain a recovery rate function. We verify the accuracy of this function every time a new datum becomes available. We use this function also to estimate Norway recoveries.

The US model now uses an alternative recovery schedule based on an average of the recovery schedules of countries which are providing these data, as the US recovery data seems lower than it ought to be. See my comment in the day's update (April 15). We also use an estimate for UK data which is not available. Some countries have changed the way they count in the middle of the process, and we have adjusted for this (or not) as we see fit - again, it would take too long to discuss this. For the most part, we use the data that is available and take it from there.


In the Finland section, where there is space, I include a pdf document with a smoothened version (4 day moving average) of the daily tallies per million inhabitants for several countries in Europe, as well as USA and South Korea. One can appreciate that contrary to what some European authorities are suggesting, hardly any European country is ready to move forward, according to this parameter, especially if we compare it to the role model country for exit strategy, South Korea. In the SIR section, where there is space, there is a picture of the current positivity rates (number of cases/number of tetsts conducted so far). It is a useful diagnostic of where a country stands in the process.

May 23-26: Updating. We will wait until the end of May to fit a new "TRUE" model for the US. A semiautomatic, almost optimal fit for the Finland model has been obtained. We will try to fit models automatically from now one, slowly but surely. An automatically fitted, almost optimal SIR model for Italy is now posted.

May 19-22: updating. On May 20 the table of contents above ("INCLUDED IN THIS POST" section) was updated.

May 18: there is no data for Finland May 17 yet.

May16-17: The Italy "TRUE" models has been fitted again this weekend; next weekend we do the U.S.

May 10-15. Updating. The US and Italy "TRUE" models are now optimally fitted. The Italy model is fitted to 4 May when restrictions were lifted. A notebook to do this will be provided later. From these models one can derive the daily new cases forecast trends in the new section at the end of the post (for more info, read there, it will be up shortly). It takes about 2-4 hours of compute time to make some of these fits. These models will not be fitted again as restrictions are slowly lifting ... which changes the forecast, hopefully in a noticeable way (or hopefully not, from the state of things point of view). The Italy SEIR model has been replaced by a SIR model. Earlier on May 10 I had posted the wrong file for the US model ... it is now correct. And apologies, had the wrong label on the Italy "TRUE" model, now corrected (hopefully) ...

May 8-9: I will leave the "TRUE" model for the US now. It is perhaps the most reliable picture of what lies ahead. One of my usual SEIR models forecasts a higher (4.4 million) susceptible population, but that number does not square with the "TRUE" model, although soon I will do an automatic and optimal fit of it, which might push this number up. Over the weekend, a new model for Italy will be forthcoming and "TRUE" models will start to be produced in a fully automated way (I will later post a notebook with the code that does the optimization; it is written withing the simplicity of built in Mathematica functionality, which means it is somewhat slow and NMinimize needs help.

May 6-7: Updating. Soon we will have to update our standard model for Italy. The "TRUE" model looks very reliable now, enough to make long term forecasts and provide a picture as to what to expect in the longer run.

May 5: The US SEIR models will now alternate with a "TRUE" SIR model (see first paragraph of text above and subsequently for explanation). Also, there is an SEIR model for Italy and now a "TRUE" SIR model as well.

April 30 - May 4: updating, US model alternating every so often

April 29: updating. Today I put back the US model with the actual recovery data that is provided. The two models will alternate. One of our alternative Finnish models squares with the latest THL recovery estimates, so we are showing that model instead of the model we had yesterday. This picture will be updated again at 2 PM EEST.

April 28: Tomorrow I will start alternating the US model with the model obtained from the recovery data that is provided. I found a source of daily increments for the Spanish data. The Italian model has been stable now for weeks, since before the peak of the I curve. Their official data is quite good comparably speaking.

April 27: updating. I will not update Spain after today until I get hold of the data from local authorities if I can. The JHU data is inconsistent both in number of cases and in recoveries. It seems the historical series is being updated retrospectively, but according to the Spanish authorities, it is not yet ready. The temporary lump sums provided temporarily make for very poor data. When it becomes ready, I will continue to update this model. If I can obtain reliable information from press reports, I will update my data thus by hand.

April 25-26: updating. Today, April 26, the recovery data from Spain is highly anomalous, for the second time (in the past, counting method changed). Unless this datum is corrected, from now on I will use an estimate based on a recovery rate function that can be computed from the data up to yesterday, or constant adjustment as of today based on today's estimate. Using this function, we obtain today's picture.

April 24: updating. It seems the model for Spain might require a steeper rise up again.

April 23: updating. I have posted yet a new model for Finland which is probably more accurate. It is hard to say, as the entire time series changes each day due to delays in testing reports. The date in the Spanish model is now correct. I seem to have made, unfortunately, a correct forecast of the consequences of going back to work too soon!

April 22: updating. Spain went back to work ten days ago. We see new growth and forecast it will continue so ... may we be wrong.

April 21: updating. I changed the text above to improve it and hopefully make it more readable, and highlight important issues. I changed the standard SEIR Finland model for an SIR model that to me, seems more realistic, given the daily tally trends. The problem with Finnish data is that the entire time series gets corrected every day, not just the last day. While this makes for accuracy, it makes modeling difficult. I will alternate with the usual SEIR model.

April 19+20: updating. There is yet another model for Finland using another estimate for the recovery schedule. At the end of the SIR section there is a picture of the current positivity rates (number of cases/total number of tests). This should be a useful diagnostic. I will start keeping a history of these data from now on (I only have a history for Finland).

April 18: updating. There is an additional model for Finland in the SIR section using JHU data instead of THL data

April 17: updating. This section now has the SIR model for Finland, we believe it is a more accurate model for the time being, and based on a just published estimate of recoveries. Our extrapolating function seems to be working quite well and we have adjusted it to reflect this last change.

April 16: updating. The German model in its section is now an SIR model. In the Finland section there is a SIR model in which a plateau, rather than a peak, is reached

April 15: updating. Today I will show an alternative model for the USA that uses an average recovery rate obtained from other countries rather than the reported data, which seem low (understandably so, it is not a priority to test people who have tested positive and are recovering at home). The daily tally in the US has slowed down somewhat, which would lend credibility to the model, which shows the infection curve getting close to a peak. Also, in the previous model, the number of susceptible individuals was probably too high. I will compute an estimated peak date tomorrow based on this model. I will continue to track the old model, but it doesn't fit here. I am thinking of adding another response to the post with a number of models which don't fit here, but I haven't made up my mind about it yet.

April 13-14: Today is the last day the China model will be updated (April 13). April 14: I am adding a SIR model of Italy in the response with the SIR model for China. There is also a SIR model for Finland in the Finland section. It is possible to compute an effective R that is time dependent (but that won't be in the post, although I will make a notebook available in that section at a later time with this). I will add SIR models for other countries as well. I am working on an optimization program for the SIR model to further automate the determination of parameters - if I ever complete this it will also be in the SIR notebook eventually. On another note, I plan to add a section with models for other Scandinavian countries as soon as I have time.

April 12: updating. Today I am adding a response with a section which discusses the simpler SIR-like model (which I managed to make work almost just as well as the SEIR model, although it is somewhat more difficult to get it to work). The SIR-like model has the advantage that analytical solutions are known for SIR models which might be modified for our specific instance of the model, and in the case of our investigations, it yields an adequate value for R0 without the need for any further explanations.

April 11: updating. The daily tally pdf document in the Finland section is now per million inhabitants. Again note the disparity between European countries wishing to pursue an exit strategy at the moment, and South Korea, the role model country. I have added a picture which explores a scenario in which Spain lifts restrictions (as it has announced) today. We are able to model (with some mathematical ingenuity) the effect of this on the S curve, and subsequent effect on the number of infections. We hope this does not happen, but it might.

April 10; updating. I added some explanations in the text and a picture that illustrates what could happen when restrictions are lifted too early and then reintroduced - this is not a forecast, just a plausible scenario. I moved the notebook to a new response at the end of the post.

April 9: updating. In the Finland section there is a pdf document with the smooth version of the daily tallies for several Euro countries, USA, and South Korea. The Italy model seems very stable now.

April 8. Updating. Today I added, in the main part of the text above, an "intuitive" note about the basic reproduction number (R0) in these models and why they are about an order of magnitude larger than the measured rates.

April 6-7. Updating throughout the day. France and UK models temporarily suspended due to missing or inconsistent data, until more data is available

April 5: Updating throughout the day. There is a new model for Austria in the Europe section. It's I curve has peaked, and it will be the first country in Europe without an outbreak, at the end of this month, most likely.

April 4: Updating. The daily tallies for Finland, with the corrected time series, has been published now (3 PM EET) and the graph updated. The model was adjusted a tiny bit. In the model for Italy there is now a possible end of outbreak date, should the data stay on the curve. It is calculated using a threshold, which I do not explain here.

April 3: Updating. As expected, there is no recovery data for Finland so we are extrapolating from previous data.

April 2: Updating. As of April 1 I will use the official data from the Finnish Department of Health and Welfare (THL) for the Finnish model. The model is almost the same as before. There was a lack of recovery data, but yesterdays the recovery data was released as a lump sum the number of recoveries. We have stretched this datum into data according to how cases might have arisen in time to obtain a model. The recovery period appears to be 17 days or so (see more detailed explanation above). We are modifying the Finland section to reflect this new reality and will explore other models. Italy seems to be reaching the peak of its I curve at around the time I estimated it would, maybe a week later.

March 30-31: Updated

March 24-27 AM EET: All models updated; PM EET: Europe models (some are at the bottom of discussion) models updated. I UPDATED and CORRECTED the NOTEBOOK.

March 23 AM EET: China and US pictures updated, as well as the reply to the post at the bottom with the five major European countries other than Italy, and Finland. Italy picture updated in the pm. The hopefully good news is that all major European countries saw a decline in the number of daily new cases yesterday. In the PM, the numbers for Italy have declined for two days in a row. If the trend continues, the I curve will peak in about 11 days. The model has changed slightly.

March 21-22 AM EET: China and US pictures updated. Large Europe countries other than Italy will be updated in the morning EET. They are found in the PDF document. In the PM the Italy model is updated. It looks like the daily new cases might have peaked yesterday. The new picture has a theoretical new daily cases peak marked as a blue dot, after which we can expect 13 days before the actual peak of the I curve. Then comes the tail, with the end of the outbreak 1 to four months (or possibly more) later. I am hoping that indeed new daily cases peaked yesterday but only time can tell. At the end of the discussion, in a reply, there is now a model for Finland.

March 20, PM EET: China, USA updated. I have updated the notebook with the China model. The Italy model has changed substantially and is now much tighter.

March 19, PM EET: China, USA, and Italy models updated. The Italy model parameters have changed again to fit the data up to this point as there is no indication of the growth of the I curve slowing down. When I first posted this, in one of the discussions, we made a forecast that the first day with no new infections (not imported) would be March 23. Apparently this has happened already yesterday, March 19.

March 18, 8:00 PM EET: China model picture updated. I corrected a mistake in the label of the picture (p=81 K). US model added. Italy model updated.

March 17: Updated. In about a week there should be enough data for other countries. The China model has now been stable for quite a while, the Italy model I still change a little bit now and then. It is worth recalling that what makes this model successful is the low number for the effective susceptibility, something that I explained with a thought experiment earlier. Containment efforts essentially remove people from the infection path, lowering the number of susceptible individuals. Please read the discussions.

March 16 - NB: I have updated the China picture and the Italy model has stabilized, and is updated and unchanged. If the containment measure in Italy as working as it is hoped for, the I curve should be peaking in about a week or so. The curve is somewhat more spread out than for China, so that the recovery to current China levels would take about 80 days rather than 40 which it has in China. The overall number of cases should be of about the same size as well, a bit larger. We only hope for the best.

March 15: updated. It is almost a safe bet to say that the situation in Italy will end up being worse than in China. It seems other western countries are headed in the same direction, unfortunately, due to various sorts of circumstances, in each country its own, including the country where I live, Finland. When I have enough data, I will move out the alternative model picture and put up the picture of a new country.

March 14: updated.

March 13: I have tried to match the tail of the I curve and the R curve in the China model, since the data is necessarily more reliable there. I will try to include models for other countries in the notebook as I am able to ... maybe in a week. I cannot post all the pictures here, I think I can post up to 5 pictures, but I will include the models in a notebook and update it periodically, albeit, not daily.

March 12: I have made an important change in the China model. I have removed the correction factor for counting method change to agree with the number of current reported active cases. What this means is that during the climb phase of the I curve, there was some under reporting of cases; and for a few days, there was also over reporting. The blue curve now agrees with the red data in this respect. We will probably never know what exactly happened, but I believe the model tells the story. We also get a very good fit for the recovery curve. The new parameters are in the title label. The timeline now starts on Jan 11. The effective susceptible population, which is what we know now, stands at 81 K. I have updated the equations picture and the situation in Italy as well.

March 9-11: updated

March 8: I managed to access the internet. The China model has new parameters to adjust for what will possibly be a somewhat lower final total number of cases. The Italy model has the same parameters as the original China model, but I expect this to evolve rapidly. The parameters are now indicated in the picture title.

March 7: I am on travel ... I will update this post in a week. Briefly, the model for italy remains to be the blue model but with p=25000. I apologize I cannot update the pictures, I only have good internet access through my mobile device

March 6: I update the picture for China and for Italy. There is also (in the same picture) a probably better alternative model for Italy.

March 5: made a correction in the notebook (rendering) and updated the picture. I have added a picture for the situation in Italy. The model is the same, except for p=15000 and for shorter horizontal shifts ... the illness does not seem as prolonged. How this will turn out to be in the end is anybody's guess right now. I estimated the size for p based on the size of the population of the cordoned area. I have removed an old picture. I will update the notebook with the Italy (and other models) later.

March 4: picture updated, minor corrections. March 2 forecasts are now superseded by a virtual end of the outbreak by March 23 with some possible further cases later on. I have attached a notebook as a file, but I could not figure out how to login to post it - I am new here.

March 3: I have managed to remove all time dependencies from the parameters and get the same good fit. I am updating the picture and publishing the equations (see the picture at the end of the post with the equations and parameter values) of what is a somewhat simplified SEIR model which I have used. From it one can calculate R0 in the usual way, and it is much higher than what I reported before. I do not what to make of it. The effective susceptible population is now 82000. I developed this yesterday and it was good to see that no modifications were needed to fit today's data. Unless there are further changes that need explaining, I will refrain from posting for a few days or a couple of weeks, except for updating the picture. Recall that what makes this possible is noting that the effect of containment and lockdown is to effectively remove a large sector of the population from susceptibility ... see my the first post and my earlier replies to questions asked in the discussion. When I have time, I will clean my notebook and post it will all the data (it is not that much).

Today I am also adding a figure that illustrates the effect of lowering sigma to .03 (sigma is the rate at which the exposed becomes infective). It flattens the curve with a delay ... I don't know what measures can bring this down, but that is what is needed.

March 2: I am adding a second updated graph for comparison with the original, and will update that graph on a weekly or biweekly basis. I will now leave the model untouched for a week. In the latest graph, the initial population is 88000, the removal rate is now .3, and the delay is to onset of infection is 10% longer and slightly different. The calculated R0 is now 8. The model now predicts close to 85000 cases by the end of March, and close to 87500 towards the end of April. I have added the E curve in Cyan.

Feb 29: It is the end of February and there is one minor final correction that makes the fit still better (after this I will let this rest for a while). The green dots which are to be matched against the R (resistant) curve is now the sum of recoveries and fatalities. There is a new picture. The susceptible population is now 90000 and all the rest is the same.

If the model has any predictive power, I would expect to see about 84700 registered cases by the end of March, and about 88700 by the end of April. I will return to this towards the end of March, or earlier if this needs to be modified further.

Also, running a similar model without data at the moment for my community, the metropolitan area of Helsinki, if we assume only city and country isolation, we can expect an outbreak that lasts about 9 months if the virus is a year round virus (that is, not a "winter" virus). Likewise, an unchecked outbreak for the whole world would last about 15 months as of now if the virus is year round. About half of the population would be infected at one point.

Feb 28: I am updating this again because I have managed to get a much improved model by accounting for a delay in the onset of recovery as well as a delay in the onset of infection; moreover, I use my parameters uniformly throughout the equations, (some weeks down the road, when the data is "complete" I will publish the final equations of the model; I am hoping that the model as it stands now will stand the test of time). The infection rate is now 2.9, the susceptible population is 95000, the rate at which an exposed person becomes infected is now .057, and the removal rate is .25. Calculating R0 in the usual way, we get around 9, which is believable. I leave yesterday's text below, but I am changing the picture.

Feb 27: I am updating and correcting this post: a simple SEIR model with some modifications (like a delay factor for the onset of recovery) fits the available data of the coronavirus infection in China. The infection rate is 2.7, the removal rate is .1, the rate at which an exposed person becomes infective is .055, and the total susceptible po






52 Replies

Very nice!

Please publish your notebook. I'm sure others would like to see the ODEs.


Thanks for adding the additional detail about the model, it helps one understand the results better.

It looks like you are using the data from Hubei, based on the large number of confirmed cases. However, when I downloaded the JHU data from github ( last evening, I got similar but noticeably different values. What is your data source, and what preprocessing did you do? What was the adjustment that you made?

What assumptions did you make when choosing a susceptible population size of 180,000? The population of Hubei is very much larger:

In[22]:= WolframAlpha["population Hubei China", "Result"]

Out[22]= Quantity[58160000, "People"]

It will have a dramatic effect on the dynamics, especially if one uses mass action incidence for the force of infection as opposed to standard incidence. Which did you use? Models for the SARS epidemic used mass action incidence.

I have found these data quite challenging to model. My current model has 7 compartments and includes quarantine. It's still very preliminary, and I hope to have it ready next week, but here is a peek at its structure (as a Petri net);

enter image description here


Let me try to explain how I tried to model these data and try to address some of your questions, not necessarily in the order they appear. First, I decided to work with the simplest model possible, to keep the number of parameters as low as possible. Otherwise, estimating the parameters gets quite difficult. It seemed to me that the basic SEIR model should work, if only we modified it to reflect the situation at hand. Then I made some thought experiments to try to make these modifications and parameter estimations. Let me explain a couple of these with an example.

The first thing I tried to understand is the effect of the containment measures enacted on the parameters of the basic SEIR model. I illustrate what happens with an simple example. Suppose you have ten families with four members each and that you all of a sudden detect three infected individuals, each in a different family. Now suppose you then immediately put each family in a different house, and you prevent them from having any contact with each other. The effect of this is essentially to remove 28 individuals from the susceptible group, which would otherwise be 40. Only 12 individuals, those families with an infected member, are now susceptible. Understanding this told me that I might have a chance of modeling the data if I lowered the susceptible population to an adequate number. I then used further information about the other possible value of the parameters, including information about maximum observed incubation periods. And then it was a matter of some fine tuning to get the right fit for the I curve. I observed that even if the actual data was only a sample of the real number of infections, the shape of the curve should be the same with the real numbers. You just multiply by a constant factor. So say that, as suggested, there might have been ten times (or twenty) times the number of cases as reported, then the susceptible population would have been 1800000 or 3600000 and likewise, you would have ten times more infections, recoveries, etc. The key is getting the shape of the curve right.

In the model, I tried to get the I curve right, and then, modify the equation for R with a root factor to account for a delay of the onset of recovery that fits the recovery data (it takes a while for people to recover).

Now about the data. I started early on, and I only had data that I could copy by hand ... and then I just kept doing it that way, among other reasons, because it forces me to see what is going on, rather than just letting the computer read it. From the 13th of February on, I subtract a constant of about 14000 to the JHU data, because of the change of counting method that lasted only a few days and the leap in the numbers. This is the simplest quick and dirty fix that keeps the data in line with the trend and nullifies the leap in the count which occurred on that particular day.

I am quite confident that the model is basically working well. As the days go by, I will continue to check it and adjust it.

If you think about the most difficult issue regarding action to contain the spread, given that you can't test extensively, is to decide when you have a large enough number of detected cases to warrant some containment action. It is a difficult decision to make, and the better data and more extensive testing you have, the better prepared you are to make a wise decision. It would be very useful to have a very good estimate of what the real number of cases is given a number of detected cases that you might have as a sample. We don't really know. Some literature has suggested that it might be 20 times the number of detected cases.

It might be that in the new foci of infection, the model might have to be modified to fit the circumstances of the situation. I am following several foci. I did model the cruise ship situation, and the with almost the same parameters I managed to obtain numbers that were quite very close to the actual numbers while that outbreak lasted. When I have time I will revisit that model and see if the improvements in the model here can give even better results. I need to consider all days, I don't have all the days.

I hope this answers some of your questions. I hope I get around to cleaning the notebook and providing all the details ... but in a nutshell, it is the simplest possible SEIR model you can think of, almost.

Your points about the susceptible population size are well taken. Unlike modeling chemical reactions in a beaker (the math is the same), one cannot assume instantaneous and uniform mixing in epidemiological models. The population size does have an effect depending on the type in incidence used for the force of infection, so one cannot always simply scale the model for larger or smaller populations.

I found an article in The Guardian about the change in counting methods in Hubei. Real data are invariable messy. Understanding exactly what the data are is crucial to modeling them accurately. For example, does "recovered" mean no longer infectious in the epidemiological sense, or does it mean asymptomatic in the medical sense? And, of course, there is always the question of unreported cases to deal with in these retrospective analyses.

I'm quite certain that one will have to model each focus or cluster of infections individually. The derived parameters should be comparable, but are most likely not transferable.

You are of course right about all of what you say. My goal was to try to find a phenomenological reason with implications about the values of the parameters that would make it possible for me to model the data, and I succeeded in that sense. I agree with you in that one has to be cautious about the scaling. But for the data that we have, we have some kind of model. I will update this data daily and see if it continues to fit the model. I expect it will and what the model predicts is in line with the expectations of some epidemiologists.

Right, real data is messy ... you have to do the best you can with it. I think that what I did somewhat works.

About recovery, it should mean "not infectious" given the tests that the patients need to go through to be discharged. However, there are cases documented when discharged individuals test positive again later on, leading to the suspicion that the disease might be biphasic in some instances.

And correct, each focus will probably need its own, if similar, parameters. I am tracking several of the new foci, and we will see what happens with time. Unfortunately, it is harder to get data broken up into clusters of interest. Surely it can be obtained, but it is not easy to get it.

I did not understand your comments on population scaling. If one scales S,I,E,R by p, keeping \beta,\sigma,\gamma fixed, p-dependence drops out of the equations(as it should). So, solving for population scaled equations is mathematically equivalent to solving keeping p as a parameter. thanks, hari dass

Look at the equations AND the initial conditions (for S). What you see (relative to the y axis) is what you get ... (I should include a plot with S) ... hope this helps; if not, we can come back to it at some point.

Posted 3 months ago

Enrique, can you confirm the following:

1) You based your model on Mainland China Infections numbers from Johns Hopkins CSSE (JHU)

2) Your graph is then a model run on the Helsinki population with 90 000 susceptible being the starting number for the population

3) Blue is the number of confirmed infections (?) still ongoing (?) (i.e. not total, not cumulative all, not daily)?

4) Magenta is the (percentage?) of recovered (axis scale missing?)

Did I get this right?

Thanks for the model!



No, there is a bit of misuderstanding I see. I answer your questions the best I can one by one:

1) That is correct ... here is the webpage with their data and graphics:\bda7594740fd40299423467b48e9ecf6

2) No, my model is a model for a susceptible population of 90000 (it can be from anywhere). It fits the JHU data. If you read the rest of my post, you will see that I explain the main effect of the containment measures is to reduce the effective susceptible population. So although a city population might be 11M, with the containment measures, that number dropped to an effective susceptible population of about 90000. It is a model of the JHU data. There is no data for Helsinki yet, as we don't have any spread here at the moment. I hope this is clear enough for now.

3) Blue is the number of infections at a moment of time. It is time dependent. It is what is called the I curve. It is NOT the total number of confirmed infections; you need to think of it (I won't go into the math details) as that number minus the recoveries and fatalities in your data, to get something similar to the red dots.

4) Magenta is the NUMBER (NOT percentage) of cases that are resistant, essentially those that are not infected any more plus the fatalities, among the 90000 susceptible ones. Horizontal axis scale is days, that is standard.

You have to keep in mind that this model works provided that things stay in check and there are no new paths of infection ... otherwise, you still have a huge uninfected population out there which could become susceptible. The difficulty now is to keep this under control.

So no, I apologize but you did not get it right, but hopefully this helps. I hope this is useful. Especially because it has been hard to get a good fit for the data. Thanks for your questions.

Posted 3 months ago

Thank you, it's much more clear now.

As a teacher of graphing and visualization, might I make a proposal that you properly label chart axes with units and also label charted values (which colour is which variable).

This makes it much easier to instantly understand the graph.

Thanks for your efforts and sharing them.


Thank you, your points are well taken and I will do so in the next round ... there are already some updates and some changes and some things that I still need to clarify (or else I am creating confusion) ... but I think I will wait a few days and see how close future data is to the model without having modified it further ... I am hoping in the end to have an accurate description of what is observed by the data and I hope to publish the equations that give the model. Thank you once again.

Posted 3 months ago

If I am understanding your graph correctly it looks as if it is predicting the end of the epidemic at about the same time as a simpler logistic model based only on cumulative cases. In the graphs below day number 1 is the first data day on the JHU database, 1/22/2020. Also the curve looks like it is becoming more symmetric like the logistic growth model.

China Case Data

Excellent to get this kind of confirmation ... later tonight, EET, I will post an update with the equations and parameters of the model, which changed a bit last night, for the last time I hope. I can now really calculate R0 classically, and it is different, unfortunately higher ... Thanks for sharing this.

Well, it seems our forecasts were quite on the dot ... there were no new infections yesterday, March 19, aside from imported cases. I am now trying to extend the models to other countries and get a feeling for how long this will last. It seems to me that the dynamics of this disease are pretty much the same everywhere, except for the volume (effective susceptibility). It seems that, from the moment that the number of cases starts climbing to there being almost no cases is at most about four months. Let's hope this is the case. The important thing is to then guard against other waves. I hope the whole world keeps its guard up.

Dear Robert,

Would you happen to have a brief notebook which implements your logistic model based on the number of cumulative cases? I would be very grateful to you ... I need this because I have a case where I only have the cumulative cases data. I thank you in advance for your help.

Best regards, Enrique

Posted 1 month ago


Attached are two notebooks which I am currently using. The fitting methods are embedded in the report procedures. When fitting to cumulative cases, the large numbers in the latest data dominate the fit, but they are probably also more accurate. At the end of the CDCData.nb there is a method for fitting the first differences of the cumulative data to a first difference formula derived from the logistic model, if you also want to see more directly the influence of the early data. We are about to enter the tail phase of the epidemic in the U.S. The model requires an exponential decay in the tail. That didn't happen with South Korea, but the China data did fit. If the U.S. data doesn't follow the tail behavior, that would indicate the model isn't valid.




Thank you very much!

Best, Enrique

I would like to add that in China the situation was strictly controlled until the end, so what had to stay constant (susceptibility) did ... I am afraid that elsewhere this will not happen. I already have a model which allows for the growth of susceptibility. It is beyond the scope to discuss it here, but I plan to make a notebook available in the aftermath, when we have the whole picture. So when restrictions are lifted, susceptibility, the parameter that controls the size of the epidemic, grows again, maybe ever so slowly, all depending ... all the measures put together work to determine that one quantity, which is the overall most important quantity.


Dear @Enrique Garcia Moreno E. thank you for sharing! Could you please share the notebook with Wolfram Language code? You can attach it or embedded into the post.

Right, I apologize I haven't done it, it has taken me a while to get this right. I will try to get it done by tomorrow, or else, I will be on travel and it would have to wait a week. The trouble is I am using approximate data or data straight out of websites, rather than the github data, which I don't find I can computer read. But an approximation to the nearest hundred works fine (as in the JSU CSSE website itself). I will explain in the notebook with more detail other aspects pertaining to the data, such as a correction for the counting methods change. I am now modeling the Italian outbreak but it is to early to settle on any parameters ... but I think, and I want to believe, that it will be controlled ... apologies again and thank you for your patience.

Attached, I tried embedding it and even though I am signed into the cloud, I can't do it.

Posted 2 months ago

I made a notebook deriving the logistic model from your SEIR equations with a simple assumption.

Dear @Enrique, is the notebook you attached to your head post the latest one? I see many images in your post that are not in that notebook. It would be great to see your most complete recent notebook for all things that you demonstrated. Thank you for sharing, this is very nice!


The published notebook is not my working notebook. I am on travel for a week. I hope to get to updating the published notebook then, apologies, and thank you.


I have updated my notebook (March 20) with the China model and data. I will add the other models when they stabilize.

I have updated the notebook again, there was a mistake in the notebook I posted before.

May 25: Updated. In this section soon there will be two new notebooks, pertaining to the automatic determination of parameters, and to making susceptibility grow.

May 20-21: The pdf with smooth daily tallies now includes tallies for Russia and Brazil, in addition to all the previous 20 countries. This file will be updated on Mondays only

May 12: The pdf with smooth daily tallies has moved here. Soon I will post a notebook with the program that does the model parametrization automatically given data to fit.

Dear Vitaliy,

I realize it is about two months now and I have not updated my notebook. As soon as I get the time (that is, I go on holiday, in about a couple of months), I will at least publish another notebook which has two things. A different kind of model that I have been discussing here (I call them "TRUE" models, and it is what epidemiologists usually use) which is very good for forecasting once there is enough data, and also, a set of programs to find optimal fits of the models to data automatically. This notebook will contain also the utility program from which you extract the forecasts, to get the pictures I have at the bottom of the post and which I added just a few days ago. I will not publish a notebook with fits to all data sets and gives you all the pictures ... but at least one, so that people can then do their own with the data they might be interested. I will also update the old notebook to contain both SEIR and SIR models of the kind I have been discussing since the beginning, with ONE data set. These, unfortunately, are harder to fit, and I don't have yet a fully, or almost fully, automated way for fitting the parameters; I have a set of "rules" which I apply to get the fits, and they work most of the time, but not always, and they are not fully automatic in the least bit. Apologies for the delays and the limitations. I am up to my ears with work. During lockdown, I have been busier than ever!

Regards, Enrique

The top model for Finland, the community where I live, is added in this response, is now a "TRUE" model of the outbreak (in a "TRUE" model, the detected cases are the removed ones, - R curve - not the infectious ones - the I curve - ... they are no longer in the infectious chain - see the main section of this post). The rest of the models here for Finland are SIR models for the data in the spirit of the original post and most of it. As a rule of thumb, the blue I curve peaks about 13 days after daily new cases peak. The end of outbreak point is from one to four months after the peak of the I curve, depending on the parameters, which depend on the containment measures and the assumption that they stay in place during the whole outbreak.

The data for the Finnish model comes from the Finnish Department of Health and Welfare (THL in Finnish). There is a delay in the release of the data of 1-2 days. however, the recovery data comes from Johns Hopkins University or from occasional Finnish health reports. According to the medical chief of staff of the infections diseases clinic at the Helsinki and Uusimaa hospital district, it was "important to define what people mean when they talk about recovery", and that "eventually it would be important to compile statistics to better understand the disease" and "was taking the numbers with a grain of salt" noting that "the criteria underlying the data are not always clear and they are not always the same in each country". He also said that "tracking recovered patients was not a top priority". (quotes source is Yle news, the state run news agency). We have serialized the occasional recovery data according to how cases might have arisen in time to obtain a model. We then compute an estimate recovery function.The recovery period appears to be 17 days or so. We will check with future disclosures the accuracy of this function.

The first model posted is a "TRUE" SIR model of the outbreak (see opening paragraph of the post for an explanation, along with further comments in the top section). The second model has been removed; it was an SIR model (see the SIR model section for the equations) with a much more optimistic forecast than the SEIR model. The now second model posted is an SIR model with a recovery scheduled taken from an estimate function based on a few recovery data points for Finland - we believe this is the best of the standard models we have been producing that we have, and it has not been adjusted for a while. The third model is a model for Norway using the same recovery schedule as in the third model for Finland. The fourth model is a "TRUE" model of the outbreak in Sweden. I have removed the model which explores the possibility that Finland has reached a plateau, a very negative outlook (in this model, the number of people that can be exposed each day remains high and constant from a certain time on). Since there are only guidelines in place and no policy changes are expected, this model should be very stable, and I believe it is possible to make long term forecasts from it ... we will use it as a benchmark.

The models for other European countries are found in another response under this one. In this section, where there is space, I include a pdf document with the daily tallies per million inhabitants (smoothened, 4 day moving average) for several countries in Europe, USA, and South Korea. According to this parameter, almost no country in Europe is ready to move forward, contrary to what many authorities are starting to say. Countries like Denmark, Norway, Austria, and the Czech Republic have daily tallies p.m. that are nowhere nearly as low as those of South Korea, the role model country for the exit strategy. This will be updated daily.

Also, in the section of alternative SIR models, which has a model for China and a model for Itally (not updated) there is now a SIR model for Finland based on the JHU data ... it will be updated regularly; as well as another SIR model using a different estimate for the recovery schedule.

May 25-26: Updating. Norway reached its technical end of outbreak (by one measure) on May 24.

May 23-24: A semiautomatic, almost optimal fit for the Finland model has been obtained. A better fit will be provided eventually (they take time to compute). We will fit a new "TRUE" model for Finland in a week as it seems necessary. On May 23 Norway published an estimate of recoveries and our estimated recovery function is within .5%. So this means that our model for Norway is very close to what is happening, as it shows in the picture. There are new recovery estimates for Finland which required an adjustment of the Finnish model. The model has been fitted now semiautomatically, not quite optimally though. With an increasing number of published recovery estimates our estimate function becomes more accurate. According to the Finnish health authority, a case is considered recovered if 21 days have gone by and the individual is not in the hospital or under treatment. The Swedish and Danish models are now updated (May 24).

May 19-22: updating. In the "TRUE" models for Finland and Denmark a deviation from the forecast trend can be observed. It would seem that in these two countries, the outbreak has been brought under control. However, in Sweden, where there is no active effort to slow the progress of the disease and there are no expected policy changes, the model is right on track with the data. Our model for Norway also indicates that the outbreak is under control and Norway is on target to reaching the end of outbreak threshold (about ten daily new cases on average).

May 18: there is no data for Finland May 17 yet, the time series is updated today though

May 12-17: Updating. I am adding a "TRUE" model for Denmark (semioptimized - the quality of the initial segment of the data is not good) and moving the daily tally pdf document to the section where the original notebook is posted.

May 10-12: Updating. Besides these optimally fitted "TRUE" models we have obtained from them a forecast of daily new cases. There is a section at the end of the post with those pictures and that information. Since these models seem to be very accurate, one can obtain these long term forecast, and fixing them at a date before restrictions are lifted, it allows one to see the daily new cases numbers deviate (or hopefully not) from the forecast trend. The daily cases pdf document now has 14 day moving averages, a much better rendition.

May 8-9: updating. The May 9 "TRUE" model for Finland is now optimally fitted almost fully automatically. It is a much better fit than previously and it pushes the forecast in the picture further out by a month (and that is assuming there the situation in the country stays more or less the same with regards to social distancing). The "TRUE" models will start to be produced in a fully automated way (I will later post a notebook with the code that does the optimization; it is written withing the simplicity of built in Mathematica functionality, which means it is somewhat slow and NMinimize needs help). The Swedish model however will be alternate with the automatically fitted one until 15 June since it is being used as a benchmark for this kind of model. Once in a while we will post the model generated on May 4 to compare with the automatically fitted one.

May 6-7: updating. The Swedish model should be very stable due to the conditions under which the outbreak is evolving in Sweden (guidelines instead of formal lockdown and no policy changes expected). We can attempt to make long term forecasts from this model. It shows, among other things, how prevalent the infection will be during this outbreak. We will use it as a benchmark. These "TRUE" models seem remarkably stable and reliable once there is enough data to fit.

May 5: We now have different models, including "TRUE" models of the outbreak for Finland and Sweden. See comments above, and comments in the opening paragraph of the post and in that secition.

April 30-May 4: updating

April 28-29: updating. We have switched the third model with our alternative recovery estimate model, which squares with the recovery estimates published by THL yesterday. This model has been right all along, so we plant to keep it now. The picture will be updated at around 2 PM EEST.

April 27: in today's update, the pdf is now the plain daily tally, and the moving average is now over 7 days ... what you get out of it is more illustrative of what is going on.

April 22-26: of all the models posted, it is starting to feel like the third one is the more accurate one, it hasn't been adjusted now in several days. I will have to wait until I have new recovery data to decide if it really is. I have removed the steady state illustration and included a model for Norway using the same recovery schedule as Finland.

April 19-21: updated. A new SIR model based on a different recovery schedule taylored for Finland is included in the SIR models section.

April 18: updated. A new SIR model based on the JHU data is included in the section of SIR models

April 17: updated. A new estimate of recoveries was published today and we have adjusted our models to reflect this estimate, in particular, the first two.

April 15-16: updated. A fourth model that explores reaching and staying in a plateau is shown. Today a recovery figure was published that certifies that our recovery estimates are quite accurate in all the models

April 14: Updates are up. We have replaced the previous second model, which was very much like the previous first model, by a SIR model, with a much more optimistic forecast, both in terms of the total susceptibility as well as in the duration of the outbreak. The models are now listed (in some order, not necessarily the following one, identify the model by the picture title): 1) a SEIR model with recovery schedule as explained in text. 2) a SIR model, 3) a SIR model with average recovery schedule as explained above . A fourth model will be provided soon. We have removed the daily number of cases cumulative mean to highlight just the models.

April 13: Finland seems to have peaked. I will only update the main model for a few days until I have enough data to determine how fast and steep will be the recovery side of the curve.

April 11-12: Updated.

April 10: The model for the Finnish data is still a moving target ... it should have stabilized by now. Perhaps the restrictions in place are not tight enough. I have replaced the model with a German recovery Schedule for something not so steep (theoretical model 2). I have updated the daily tallies and changed them to daily tallies per million inhabitants. Notice that those countries trying to pursue an exit strategy like South Korea have numbers tallies that are significantly higher at the moment, or not even in a downward trend

April 9: updated. A pdf document in this section now contains the smooth daily tallies for many countries in Europe, USA, and South Korea.

April 8: as the data for days past arrives and the time series becomes adjusted, the curves of the models are becoming a lot flatter and spread out in time. For the most part, our forecast for peaks are now in line with THL's forecasts. The models have changed quite a bit.

April 5-7: Updated

April 4: PM The adjusted daily tallies have been published now and the pictures updated. AM Updating. The daily tallies for Finland have not been yet released in the morning, only a lump sum indicating the total increase in the last two days (part of the increase is retroactive to past days). This model will be updated in the evening. It has required adjustment given the large number of new cases reported.

April 3: Updated. The recovery datum for the day in the actual model is extrapolated as there is no recovery data available for Finland. The alternative model uses a theoretical recovery based on a best fit of to prior serialized recovery data (plus fatalities) posted by JHU.

April 2: This section has changed again today, April 1. There are now two models. First, the model that is posted at the top. Yesterday we reported that there was no recovery data, but it was posted today as a lump sum at the JHU site. Please read the main text of this response for more information. Second, a model that relies only partially on data. However, we feel this is more representative of the real situation at the moment. Please read the main text in this response.

April 1: As of today, the top model will be based on the official data of the Finnish Department of Health and Welfare (THL).

March 30: top model removed, new top model is previous second model, updated. There is a new model with a forecast for April 9 assuming that the daily number of new cases is about the same for the next ten days. The third and fourth models we had before have been removed.

March 29: All four models updated. March 25-27 PM EET: Finland models






Here are the models for Spain, Germany (SEIR or SIR), France, and now Austria. The recovery data from the UK does not seem to be forthcoming. This makes it difficult to produce an SEIR model. Thus, I am replacing the UK model with a model in which there is an estimated resistant curve based on available data from European countries of comparable size and CFR and a time delay. I will explain this if there is interest. The UK model is now regularly updated according to this recovery schedule. Similarly, there is, in the JHU data, a huge leap in the French number of cases on 4.4, and although I was gather other figures for 4.5, I was unable to do so for 4.6. Official data can be found in the French government's site. However I have reverted to using the JHU data as of April 14. As of April 15 I am alternating the German SEIR model with a SIR model.

The Finland section now has a document with the smooth (3 day moving average) of the daily tallies of these countries as well as other Scandinavian countries, the Czech Republic, Poland, USA, and South Korea.

The model for Austria now uses further modification of the SIR model equations. It allows to model growth in susceptibility at any moment of time after the initial conditions. Here, we apply the growth term (a term in the equation for s') at approximately the time when some restrictions were lifted on April 13. You can see the s curve grow from there on a bit, making the downward slope of i increasingly gentler ... the initial susceptibility is roughly the number of cases we have now.

May 20-26: Lately there are delays in the reporting of data for this section (as well as inconsistencies, such as fewer cumulative fatalities from one day to the next). This section will be updated with a delay of one day, except for Austria. The UK JHU time series is now inconsistent. It appears that Spain has stopped reporting recoveries. I will leave this last picture unmodified until I have time to compute an estimate based on the historical rate or replace it with a "TRUE" model.

May 8-19: Updating. Probably soon I will stop updating this section on a daily basis, to update perhaps on a biweekly basis. Another section with "TRUE" models for the countries in this section will be started, hopefully soon. Not only are the "TRUE" models standard practice in epidemiology, I can also parametrize them optimally almost automatically. They give very good forecasts once there is enough data as long as there are no changing circumstances (policy changes). The UK model has been replaced by a SIR model, and the Spain model has been adjusted. A notebook with code to produce the "TRUE" models almost automatically will be provided.

May 6-7: updating.

April 29-May 5: updating.

April 28: updating. I found a source with daily increments for the Spanish data. Let's hope they are consistent. The French data appears still inconsistent. I will try to recreate a consistent series from the French government website when time permits.

April 27: updating. I will not update Spain after today until I get hold of the data from local authorities if I can. The JHU data is inconsistent both in number of cases and in recoveries. It seems the historical series is being updated retrospectively, but according to the Spanish authorities, it is not yet ready. The temporary lump sums provided temporarily make for very poor data. When it becomes ready, I will continue to update this model. If I can obtain reliable information from press reports, I will update my data thus by hand.

April 26: updating. Today, April 26, the recovery data from Spain is highly anomalous, for the second time (in the past, counting method changed). Unless this datum is corrected, from now on I will use an estimate based on a recovery rate function that can be computed from the data up to yesterday. of adjust subsequent data with a constant after today's estimate. Using this function, we obtain today's picture. The French JHU data is still anomalous, so I will continue to use my hand curated series.

April 25: updating. I have corrected the end of outbreak for the Austria model to late summer at the earliest, if nothing changes too much ... as further restrictions are lifted, this is likely to change again. The JHU data series for France has apparently been corrected. I will try making a model with that data tomorrow. In the meantime, what I have been using is working fine.

April 24: updating. IT seems the model for Spain might need a new steeper rise ... The JHU data for France is inconsistent, and the government data is partial or I cannot curate it all ... I am curating data by hand now based on various reports, but I might have to give up on this model unless there is a consistent data source which I can easily get a hold of.

April 23: The date is now correct in the Spain model. It appears, unfortunately, that my forecast regarding the effect of going back to work to soon was correct weeks ago.

April 22: The numbers in Spain seem to be on the rise, some ten days after part of the country went back to work. In a little while we will update our forecast for Spain in the main section, which allows for growth in the number of susceptible individuals. We include the same model here ... it is based on a modified set of equations which I might post later. We hope that this will not bear out, but for now, it seems as if it might. The model is a modified SEIR model which I will post in the aftermath of all this. I also replaced my German data with the JHU curated series, resulting in a new model

April 21: Updating. The model for Austria now uses a more elaborate model based on the SIR model which allows s(t) to grow. I might post a notebook in the aftermath, without a discussion. Notice that susceptibility grows after April 13, when restrictions were ifted.

April 19: I have put the German SEIR model again.

April 18: updating, the Spanish recovery data not forthcoming ... as I said yesterday, this model will need to be revised as the data is made available. For the moment, I am using extrapolated recovery data.

April 17: updating. Spain changed its counting methods today. As was in the case of China, I will temporarily correct the data, until there is enough data to see what the trend is and revert back to the data that will be provided from now on. Surely, it will necessitate a revision of the model. Apologies for this. Also, the entire JHU data series (number of cases) has been altered. The end result of this is that I do not have good data to provide a model. I have posted what I was able to come up with based on previous estimates. It will be necessary to have at least two more weeks of data before anything reasonable can be done about this, OR, the data series needs to be corrected as it probably still needs to be.

April 15: updating I am replacing the German model with an SIR model for which I get a much better fit (it is easier to fit these models). The model gives an estimate of R0. For the equations of the SIR model used see the response towards the bottom of the post.

April 11-14: updating. (French data ok after checking, yesterday I used the government website, I use their number of cases reported there, and their TOTAL number of deceased: Today, April 14, I have reverted to use the JHU data.

April 10: I updated the model for Germany with today's data ... it needs further work, it is quite challenging to get a good fit even with my tools ... updating throughout the day.

April 9: updating. There is now, in the Finland section, a pdf document with the smoothened daily tallies for these and other countries. According to this parameter, hardly any European country is ready to move forward and lift restrictions, contrary to what some authorities are suggesting. The German model will be updated tomorrow, when all the cases have been reported for today (that usually happens late). The data I have right now is proving to be challenging to model, and it might be that we need for further data to obtain a good model ... but more on this tomorrow.

April 8: updating.

April 7. Yesterday''s Germany model is now up. Spain's model for the day is up. I am now using the French government's published data for the French model. I replaced the UK model with a model which has an estimated resistant data, please read above. The Austrian model updated.

April 6. Something has changed in the reported figures for France. My model is based on the reported daily increment of cases, not the reported total number of cases from now on, if this datum is available. The models for France and Germany will be updated tomorrow, data forthcoming. I have removed the UK model for lack of recovery data, see explanation above.

April 5. Updating. I do not have data for France yet, maybe they won't be updated until tomorrow. There is a new model for Austria, which has an I curve that has peaked. The outbreak there will probably end at the end of the month.

April 4. All models updated

April 1-3: Updated

March 31. It is possible that the I curve is starting to peak in Germany. If the trend continues, we will revise the model accordingly. On the other hand, there was considerable growth in France.






Enrique, I am writing from Italy. I am a physicis. Today I was trying to see how to adapt SEIR to the OBSERVED value and I found your work ! Impressive. Congrats. You know how bat the situation is in Italy. Your work would be extremely useful. I would like to suggest to additionally test your work on a couple or regions in Italy: Lombardia, Emilia Romagna, Veneto and Piemonte. In this way you could spot if there are or not differences in the fitting parameters. Would it be possible at all ? Please reply me asap: time is of essence. With my best regards Roberto Battiston

Dear Roberto,

I would be glad to try to come up with a model for the Northern regions of Italy, if I can find the data. Alternatively you can tell me where to find it, that is probably a better solution, or you can send them to me (I can tell you how, or you can post them as Mathematica lists of numbers in your reply). I can curate it by hand if necessary. Please bear in mind that it is difficult to really know how the parameters are going to end up in the end until the the I curve has peaked. Although I have learned to make educated guesses for this disease now. So I think I might be able to get something that is close to the reality. Glad to help if I can. Personally, I don't think the outbreak in Italy is going to get much worse than is implied already in the posted model, in terms of total number of cases (susceptibility). Best, Enrique

Dear Enrique,

the data are here, for the regions update every day at 18:00

The regions are all there : the most hit, as you know are Lombardia, Emilia, Veneto, Piemonte, but the disease is spreading allover the country.

I would have many questions to ask you.

1: what the parameters (beta, sigma, gamma, p, Es, Is) ? 2: apparently is no lockdown effect/date in your model ? 3: I am puzzled by the Wuhan case: how did you optimized the parameters ? the early data are clearly filled with artefacts. Did you fit/optimized using the later date ? The model is robust enough to find its way through the overall behaviour, unsensitive to "strange" data ? 4: would it be possible to split fatal from recovered ? 5: could we have some discussion through private email or phone ? if so my email is please reply there.

thanks again for your very interesting work , it is exactly what we need.


Hello again,

Thanks, give me a day to work on this … I am about ready to quit for the day, except for minor things.

To answer your questions:

1) If you go to the top of the post, I posted the equations and explain the parameters. Beta is the rate of infections, as it is known. Sigma is the rate at which the exposed become infective. Gamma is the removal rate. p is the total number of susceptible individuals (this is equal to the total number of cases there will be at the end). The shifts are used to align the curves with precision with the data and to account for the fact that in different countries, the "release policy" of sick patients varies … I won't go into details, maybe some other time; mathematically, it works.

2) The main effect of the lockdown is reflected in the size of p, as I explained in the discussion with Robert Nachbar above. If you read that whole discussion, at some point I explain why this is the main effect of the lockdown; it is the secret to making this work, and it also makes sense from the epidemiological point of view, not just mathematically (since you are a physicist you might appreciate the thought experiment discussed above)

3) Initially I had a much more complicated model which I used to estimate the parameters for the basic SEIR model more or less automatically, along with the assumption that p had to be low with a lockdown. Then I discarded that and learned to adjust the parameters as needed for the initial data set (China) and then for other data sets. As I said, I made a thought experiment that pointed the way and taught me that the key parameter to make this work is p. If you read the discussion with Robert Nachbar I explain this there. The main effect of the lockdown is to lower p.

I usually fit first when there is already enough data, and modify for a while until the model is stable; then it just works well from a time on and I don't have to adjust the parameters so much, or even at all. I have worked really hard to learn to fit the parameters. If I had time, I would use machine learning to do it, but right now, I have had to learn to do it myself. You could also do it computationally, but the results are not always the best, and that also takes time and work to get it right. I have learned to modify the model for each data set fairly effectively, each data set has its own strangeness, but the data sets are not so different that some basic values for the parameters could not account for it for the most part if you know how to adjust them properly … that is where ML would be useful, and what I have learned to do without ML.

4) In this model you cannot split recovered from fatal.

5) Yes, I will be glad to … give me a day and I will write to you. I am working 24/7 on this and many other related things. So I will get in touch with you tomorrow.

Best, Enrique

Posted 2 months ago

Hi, can you post the code you use about how the fitting is done to obtain beta,sigma,gamma,Ishift,Eshift, for example for some random data of {day , infected, resistant} ?



This is not possible at the moment, it is incomplete work. At the end of the day, I do quite a bit of fine tuning based on my by now fairly good understanding of the behavior of the parameters. You also need to make judgement calls regarding the quality of the data, and on what part of the data stream you want a better fit. For example, in the China model I fit to the tail first (once it was there), and then adjust to fit the front part of the data stream. This is, at the moment, an interactive process. I hope to have useful tools to do this later on more or less automatically. But that is quite a bit of development work.


I will post this soon for the SIR models.

Posted 1 month ago

Doctor Moreno, I've been following this discussion, which I found through Robert Nassar's, for nearly a month and I've been consistently impressed by not only your research and models (which are the best fits to the data that I've seen ANYWHERE so far) but also by the explanation and discussion of the work. I actually made a Wolfram account primarily so that I could post this message to let you know how much this kind of work (and education about the underpinnings of the work) is appreciated. I look forward to further exploration of Wolfram's community in hopes of finding similar quality discussions of subjects more closely related to my field, as well as to seeing the full body of this work in publication. Many thanks!
As I understand it, you're plotting a standard SEIR curve and doing a best fit to the reported data (as reported by various health organizations)- therefore looking ahead and using this model predictively the future curve is an indication not necessarily of how many are actually infected but of how many will be reported as such on each coming day. To me this is very interesting because it sidesteps many of the underlying issues of tainted data due to under testing, selective testing, flaws in tests, etc. and just rolls them up into the variables of the model. Picking those apart after the fact is very interesting, but in the moment, given the data set(s) available, this seems like perhaps the most useful utilitarian presentation of a model.
It would be very interesting to me to see as a 3D plot, or even as an animation (use time to show another variable?) how your model has been adjusted as time and the epidemic has progressed. I've done that crudely for myself just by flipping through the images I've saved for reference of the plots you've presented and updated daily watching the curves change as new data points appeared and I think a representation of that with the variable values adjusting as the model adjusts would be revealing.
I think of most things as plots of curves or surfaces first, equations driving their shapes second so I tend to be biased toward a graphical representation. Therefore, I'm highly interested in a plot of the underlying variables used for each best fit SEIR curve day by day as the model was adjusted to fit each new data point. Having a near-constant value the whole way through would be the classic case, where the model is an excellent fit to the phenomenon all the way through. In this case though, I believe that one of the big differences is in your use of a much smaller S initially than the total population of each region reporting data (as one example variable). The explanation that the entire population isn't actually susceptible early on (for instance, due to social distancing/quarantining attempts) rings true to me (I think back to chemical rate or mixing equations from Calc II and remember that those always assumed "well mixed" tanks of solutions, which is NOT always the case in real life!). Therefore, a plot of how the S needed to change with time within each country to keep a best-fit would likely be very revealing of how the epidemic surged (or didn't) through the population and changes in it's slope might correlate with changes in testing, reporting of data, treatment methods, institution of distancing measures, etc.
A deep dive into the other variables' change over time likely would be equally revealing. I'm too new to Wolfram software to know if this "historical" data for earlier fits is still included in this notebook, but it would of interest!

Dear Zach,

Thank you very much ... indeed, my intent was to be "utilitarian" in the sense you point out, and I think I have more or less succeeded ...

But let me tell you, you have outdone me now, and I would almost like to ask you for help. If you have kept the daily slides, then please keep saving them ... and later, perhaps you might make them available to me as a time sequence ... I ask because I have not kept track of the parameters as time goes by, (lack of foresight on my part) and indeed, that would be very useful information for the sequel of this work which has to do with the methodology for fitting the parameters. As you mention it, having the history of those fits would go a long way towards completing that work; but I have not been keeping track of that history (because basically, my methods work well for the time being, and lack of foresight combined with lack of time to make a tool or keep track of all of them on a daily basis). With all that said, I hope I we can stay in touch and correspond about this matter later on. Once again, thank you for your good words, and I will try to keep making this as useful as I can to the community at large. If I only had the resources, I would do this for all countries. But for now, this will have to do.

Best regards, Enrique Garcia Moreno E.

Posted 1 month ago

I'm afraid I also lacked that foresight and only have a little saved (and most of that's just for the USA) as it was a while before I realized the significance that the changing variables had. I've begun a more thorough archive NOW though and I expect to keep it up for the future! I'll be happy to hand that archive to you whenever you like.

It strikes me that the hand-adjustment of the variables that you've been doing is analogous to the best-fit line trick of laying a ruler through a scatterplot of data with a linear trend and eyeballing and equalizing dots above/below the line. It is quite effective for datasets with a clear linear trend and likewise, your methods are self-evidently quite effective for these SEIR-like trend datasets.

A much more computationally intensive method would be to apply the appropriate Euler's method (or Runge-Kutta, I think 4th order? method) to the SEIR equations for the dataset at some early time per each group reporting data (world/country/province) with enough volume and shape to be fit to, and re-do that best fit for each day data is reported. The changes required of the variables to maintain that best-fit would then be tied to a more mathematically rigorous definition of "best" fit as well. Doing this and comparing the changes of variables over time per group would also seem likely to help determine correlations between generalizations of how the epidemic progresses as well as comparisons of each group's susceptibilities and methods of combating the virus.

You are right of course in what you say, but there is a caveat ... in using computational methods, I have had to "intervene" and fit to just part of the data, rather than the whole set, when the data was "messy" and behaved unpredictably ... I have since developed a more or less systematic way of doing these interventions with the aid of the additional parameter in the model (note it is not exactly an SEIR model). My line of thinking is that computational tools aided with further AI computational tools that emulate the actions of the interventions would be the ideal way to go about this. If I ever get time to do it, I will try to build such a tool. But for now, my approximate methods, the use of a more complicated model to get first approximations, etc. will have to do. Perhaps later we can discuss how one might do the fits with established methods in Mathematica ... if you are up to doing something like that. There are a couple of us now working together on this ...

In this section I attach the notebook. I will update it soon and add new features to it, such as the effect of lifting restrictions early.


This post has been listed in the main resource-hub COVID-19 thread: in the section Computational Publications. Please feel free to add your own comment on that discussion pointing to this post ( ) so many more interested readers will become aware of your excellent work. Thank you for your effort!


Is it possible to post an animation, and how?


In this section I discuss an SIR-like model which works almost as well as the SEIR model, at least with the Chinese data. I will have to investigate whether I can make it work as well with other data sets.

Our SIR like model is described as follows

s'(t) = - Beta * s(t) * i(t) / p,

i'(t) = Beta * s(t) * i(t) / p - Gamma * i(t - n),

r'(t) = Gamma * i(t - n)

The function s(t) is the number of susceptible people (the people that can get exposed to the pathogen) at time t; i(t) is the number of people who are infected and infective; r(t) is the number of people who have become resistant to the pathogen: they have recovered and developed immunity or died. Now the parameters.

beta is the rate of infection or "force of infection", gamma is the removal rate, and n is a shift parameter used in part to line up the curves (see a bit of an explanation of this in the SEIR section). The parameter values are in the titles of the pictures for each country. In general we assume i(0)=1 unless stated otherwise in the model label. Also, s(0)=p, and r(0)=0, unless otherwise stated.

We present in the picture, an SIR model and its parameters that fits the Chinese infection data. With time, I will try to fit the SIR-like model to other data sets. I also plan to investigate the known analytic solutions (although I need to understand how the shift parameter changes the classical solutions) in order to attempt automatic fitting via computational optimization in Mathematica. This is work in progress.

May 25: Positivity rates updated

May 21: The positivity rates file will be updated on Mondays from now on. It is updated today.

April 21-May 12: The SIR models will now only be updated occasionally. The positivity rates picture will be updated daily.

April 19-20: Finland updated. A new SIR model for Finland with a different recovery schedule is included. We include now the current positivity rates for various countries (number of cases/number of tests)

April 18. There is now a SIR model for Finland which uses JHU data

April 14: I have added the SIR model for Italy. I will not update this daily. Later, I will make a notebook available for this model. If I am able to complete it, I will also have in the notebook a program to find the parameters to fit the model to the data - but I cannot promise I will have that, at least not soon. Note the forecast is somewhat more optimistic, both in regard to total susceptibility and duration of outbreak, than the SEIR model so far. It is hard to know how the outbreak will evolve in the absence of further data.






Posted 1 month ago


Attached are two notebooks which I am currently using. The fitting methods are embedded in the report procedures. When fitting to cumulative cases, the large numbers in the latest data dominate the fit, but they are probably also more accurate. At the end of the CDCData.nb there is a method for fitting the first differences of the cumulative data to a first difference formula derived from the logistic model, if you also want to see more directly the influence of the early data. We are about to enter the tail phase of the epidemic in the U.S. The model requires an exponential decay in the tail. That didn't happen with South Korea, but the China data did fit. If the U.S. data doesn't follow the tail behavior, that would indicate the model isn't valid.


Posted 1 month ago

Hi Enrique,

Could you please post the notebook for your Spain model? I am curious as to how susceptibility changes as regards lifting restrictions.

Many thanks, Martin.

Dear Martin,

Please give me a couple of (or a few - I am really busy at the moment, fortunately) days to produce a clean version of it. If you have done a little mathematical and physics modeling you will probably understand why I chose to do it the way I do ... I have another version of it that produces a steady state ... etc. I will try to pack it all there, depending on how much time I have. Once you understand how to operate on the basic equations, you can get to model pretty much any effect which depends on susceptibility (lockdowns, lifting lockdowns, more or less strongly, etc.). I will attach the notebook in a reply to your reply directly (there is no space up in the main sections). Alternatively, I might start a new section for this purpose ... when ready, I will let you know.

Best, Enrique Enrique

Posted 1 month ago

Thank you!

Dear Martin,

Attached to this reply is a quick and dirty notebook which shows how to make S grow ... the change is in the equation for S'. If you uncomment the commented factor, you can also bring it back down to zero quickly ... sorry I don't have time to make this better looking. Hope this is helpful.

Regards, Enrique


In this section we use the optimally fitted "TRUE" models to forecast daily new cases trends. We have models for USA, Italy, Finland and Sweden, and later, Denmark, (not necessarily in this order) to show how lifting restrictions causes a deviation from the forecast. Sweden is an interesting case, as there are no policy changes foreseen in the future. The picture for Italy will be provided when it is ready. The US and Swedish models will not be updated. The Finnish model will be updated last on the 13th of May, when restrictions are lifted. The model for Denmark will be fitted to the 13th of April, when restrictions were lifted.

The yellow curve is the actual number of daily cases, the blue curve is the centered 14 day moving average of those numbers, and the green curve is the trend forecast by the "TRUE" models.

Recall that in a "TRUE" model, the number of cumulative cases is matched to the R compartment of the model. The logic is that every individual who is found to be infected is effectively isolated and removed from the infection chain, thus becoming part, in reality, of the R compartment. This is different than the models we discussed originally which were meant to model the data in another way using the SEIR/SIR formalism (the compartments are defined differently, and we have argued somewhat vaguely why we think this works ... ). The SIR model considered in these "TRUE" models, are pure SIR models, that is, there is no delay term in the equations. I will come back and write the exact equations here.

May 25: updated

May 18: updated, the green curve is extended to 4 July.

May 12: The forecasts of some countries are now extended by a couple of months from May 4. There is now a forecast for Denmark. On the weekend all forecasts will be extended by two months. The Swedish model has, in red, the smooth number of daily fatalities.

May 10: earlier today I had posted the wrong file for the US .. it is now the correct one ... it follows the trend very tightly. These graphs will probably be updated on weekends only, although I might post a modified version with a longer green trend curve to show what will follow two weeks in advance. I hope there are now no mistakes left in my plots.






Posted 15 days ago

Hi Enrique,

Could you possibly let me know how to apply and lift restrictions? Many thanks, Martin

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract