Nice post! I'm glad you had better success fitting the data than I did, and the comparison of several countries with quite varied mitigation strategies is interesting.
I'm still concerned that the total population has to be made so small, and we should be able to devise a model that effectively reduces the number of susceptible individuals in a mechanistically plausible manner.
The time delays, in principle, replace the duration of infection and time to death, and therefore the rate constants [Gamma] and [Mu] shouldn't be needed. Have you tried fitting the data without them?
Looking at the summary table, [Gamma] isn't much greater that [Mu], only a factor of 15-30 for the different fits, so the equation for I should still include the term for death. Did you try modeling without making that assumption?
It seems that mathematician like to make these kinds of simplifications so that the system can be linearized, and additional statements about steady states and stability can be made more easily. With the capabilities of WL, I don't think that's necessary and we can get sufficiently accurate answers to those questions with simulation. Sometimes these simplifications are made to reduce the number of parameters, but again it's almost trivial to get sensitivities and and one can still maintain sufficient rigor in parameter fitting by making use of proper identifiability analysis.