Tag: Corona Virus

COVID-19 Monitoring Infection Spread in India – Model versus Actual Public Data

It is my intent to update the above graph for India from 22 Mar 2020 to 22 Apr 2020. The graph above will be replaced on a daily basis. In the above graph t=1 is Day-1 which is set at 22 Mar 2020. Like wise t=2 and so on will increment from the set point date of 22 Mar 2020. For those who don’t like math, they just have to look at the graph and see the trends.

The data for India from 22 Mar 2020 to 05 Apr 2020 is spurious. It is far removed from the truth. It is because of insufficient testing in the country. This gives an artificially low number for inFections. The graph, with MF=0.04, is the Italian trend for inFections. The India data is 56% lower than Italy, for the same time period of exponential growth. But, the Indian population is 22.56 times that of the Italian population.

The model has close likeness for MF = 0.02, which means that it is following a curve trend that has values lower than that for Italy. Further, we should be expecting an MF = 1, since India should scale to Italy based on the Population Ratio (PR). So, an MF = 0.02 is unreliable and should not be used as a possible model.

The real time data for the exponential growth phase for infections had started on 22 Mar 2020. However, as seen from the graph, the real data does not fit the model, for any value of practical MF. As such, it is very likely that the Real Data published for India by JHU is not of practical use and also does not represent a good sample of the population.

I’m pulling the real data for India from the data set publicly available from John Hopkins University & Medicine (JHU), Center for Systems Science and Engineering (CSSE). It’s not clear where they are getting the day-to-day data from for India, but it is surely from the list of data sources listed on their data archival portal. The Indian Government has a website page (https://www.mygov.in/covid-19 page) for COVID-19, but there is no “day-to-day” data history.

A key conclusion that can be made as on 05 Apr 2020 is that the reported data for inFections in India is totally wrong, in its entirety.

In my previous article, I had established the equation for the exponential growth model for Italy. The population ratio (PR) is 22.56673 for India. Multiplying the exponential growth model with PR should take into account the population differences. I’ve further multiplied it with a Model Factor (MF), which is a number greater than 0, but more likely about 1 for India. This factor fine tunes the scaling and at some value it will have a good match to the real world infection data. When MF is 1/PR, the curve is an identical match to the exponential best fit for Italy.

As such, for India, the modified exponential equation is:

y(t) = MF * PR * 631.06 * exp (0.1852*t)


y(t) = MF * 22.56673 * 631.06 * exp (0.1852*t)

To predict the Infection Cases in India, select an MF between 0 and 1, plug in a value of “t”, where “t” is the day count starting from 22 Mar 2020 (t=1) for India. The computed value of y(t) is the predicted Infection Cases on day “t”.

COVID-19 – Spread Model for India based on Italian Trend

Days (Scaled to match exponential infection growth trend in Italy)

Let me summarize first and then I’ll explain the model next.

By the third week of April (most likely around 19 April 2020) the model predicts COVID-19 (Coronavirus 2019) infections in India at 1.44 million and deaths due to COVID-19 at 137,000.

The death rate in India should plateau off earliest by 20 May 2020 or latest by 20 July 2020, provided we follow the lockdown and medical protocols used in Italy, France and the US, and possibly China.

I hope I’m terribly wrong and that the actual data will be much lower. But, my analysis comes out of a simple mathematical model, which has been done by scaling and population normalization on the data set publicly available from John Hopkins University & Medicine, Center for Systems Science and Engineering (CSSE). The data set is available here. The time wise data in the set is based on the cumulative number of infections and deaths.

The John Hopkins raw data is available for almost all the countries where COVID-19 has spread. I’ve analyzed the data for a small sub-set of countries – China, France, US, Italy and India, since my primary goal was to figure out some predictions for India.

COVID-19 Infection Analysis

Below is the graph for the confirmed infections in these countries.

Actual Dates

This graph shows that the data for China pretty much starts from the exponential growth phase for the infections. We just don’t have any reliable data prior to 22 Jan 2020.

The first documented infections reported in the data set are 2 for France on 24 Jan 2020, 1 for US on 22 Jan 2020, 2 for Italy on 31 Jan 2020 and 1 for India on 31 Jan 2020.

The exponential growth starts at about 05 Mar 2020 for France (380 cases), 07 Mar 2020 for US (402 cases), 26 Feb 2020 for Italy (453 cases) and 22 Mar 2020 for India (396 cases).

To visualize the country trends during the exponential growth rate regime, I have rescaled the data, by simply shifting the origin. I’ve taken Day-1 as 22 Jan 2020 for China, 05 Mar 2020 for France, 07 Mar 2020 for US, 26 Feb 2020 for Italy and 22 Mar 2020 for India. The results are quite interesting.

Days (Scaled to match exponential infection growth trend in Italy)

In the scaled data the trends for these five countries are very similar in the first 6 days. China data deviates from day 6 and the US data deviates from day 12. The data for Italy and France are almost identical. India is just starting on the curve at Day-1 and Day-2.

The trends can be explained on the ability of the countries to carry out testing for COVID-19 and also the population density. China having a population of 1434 million explodes off after Day-6 and the US with a population of 329 million branches off dramatically at Day-12. Notice how close the trend is for France and Italy with populations of 65 million and 60 million respectively. The COVID-19 growth patterns in these two countries are so identical. Considering that these two countries have a similar and robust health care system and testing system, we should be able to trust the data and categorize it as reliable.

It’s challenging to say what is happening in India. The first reported case was on 31 Jan 2020. It’s taken until 22 Mar 2020 to reach the start of the exponential growth stage at an infection level of 396. It’s impossible to have such a long gestation period. Either India is very lucky up to now or the data is completely wrong. It’s likely that the raw data is wrong, since the country may have missed out on enhanced testing during this period. My personal experience has been that the Private Health Care in India is at par with the rest of the world, in many cases better than the US and Europe. However, the concern is the quality of health care that can be given to the poor and underprivileged. Its tough for this category of population. The care needed for a COVID-19 infection is almost impossible to administer for this group.

If we were to model the trend based on one reference country, which one would it be? I would go with the Italian data set for several reasons. First, it models well with the data from France, both having similar population. Second, the Day-1 start of 22 Feb 2020 for Italy is much earlier than the Day-1 start of 05 Mar 2020 for France, and we have more predictive data to work with. Third, the US was not testing enthusiastically in the early days and it is only just about catching up. Italy had a better record of testing than the US and it’s Day-1 data is superior to the US to begin the modeling. Fourth, I think it is reasonable to state that the Italian data is more honest than the China data. For all these reasons, I’ve chosen the Italian data set as the reference bench mark.

Taking the Italian data set, I’ve extrapolated the data from this to other countries by normalizing the Italian data with a factor of the population ratio. I’m defining the population ratio (PR) as the “Population of a Country” divided by the “Population of Italy”. For China, France, US and India this ratio works out to 23.67, 1.07, 5.43 and 22.56 respectively. This is the resultant graph from this simple model.

Days (Scaled to match exponential infection growth of trend in Italy)

Day-27 in this estimation falls on different calendar dates for each of the countries, due to the normalizing mentioned above. For simplicity, the dates for Day-27 and the number of infections are reported here, which is simply an extract from the above graph.

CountryDate# Infections
China19 Feb 20201,513,747
France02 Apr 202068,762
US04 Apr 2020347,417
Italy23 Mar 202063,927
India19 Apr 20201,442,624

The prediction for China is 1,513,747 infections on 19 Feb 2020 – is this correct!? Actual John Hopkins data for China on 19 Feb 2020 is 74,619 infections. The predicted and actual data for China is off by 95%! Why? I have based the prediction on the Italian data set. For the four reasons mentioned above, it is a good bench mark to start with. This Italian data set predicts the trend for France very well. Based on the actual confirmed cases data, on Day-17, France has an infection level of 14,463 and Italy has an infection level of 17,660 respectively. If the Italian and French data match as per the model, which is based on population ratio and simple scaling, then it probably is a good indictor for other countries. Based on the Italian model, the actual China infection level should have been about 1.51 million! We can look at the data from France and the US and check the efficacy of my model – we just have to wait for 02 Apr 2020 and 04 Apr 2020 for the actual data.

I estimate that the COVID-19 infection in India will reach about 1.44 million by 19 Apr 2020. Whether the actual reported data, from actual testing, matches this or not, is left to the imagination. Math does have the predictive power. Comparison with a model is only as good as the experimental data collected. If the two don’t match, either the model is wrong or the data is wrong!

COVID-19 Infections – Exponential Equations

There is one more mathematical insight that I want to share for predicting the infections. Taking the actual Confirmed Cases data for infections in the exponential growth range for the countries (as above), the exponential best fit can be computed, along with the associated equations.

The exponential equations fit the actual real data remarkably well for France, US and Italy. The R2 variance is in the 98% range, indicating that the fit is good and reliable. I did not bother to plot the real data for China, since the exponential fit is highly tainted, in the sense that the real data tends to follow the Italy-France trend, which is absurd for the population size of China. There are two take aways from this chart. First, the curves follow a nice exponential fit. Two, the exponential equations are all different for each of these countries. The real data from India, if it has sampling integrity, will also have its own unique curve. Sampling integrity means that sufficient testings are done to get the accurate ground level reality for the infections. The first 2 data points for India, in red, are plotted on the graph. We have to get more data points in the next 10 to 15 days (April 02 to April 07) to figure out the equation for India.

I’ll be using a modified Italian exponential equation in a separate article and hope to post the “day to day” correlation between the actual published data and the equation.

COVID-19 Death Analysis

No one likes to think about or even analyze the death statistics. But, in these extraordinary times, we have to undertake this exercise. If we know what a model predicts for the future, which is based on mathematics, we could as intelligent humans take the necessary policy measures to change the trends. We could find medical remedies (more beds and ventilators, medicines, availability of doctors and nurses, transport to medical facilities), we could aggressively work on vaccinations, we could incorporate anti-growth measures such as social distancing and we could incorporate financial remedies to ease the burden of families.

Actual Dates

The above graph is from the actual John Hopkins COVID-19 data set. A key trend observed is the asymptote in the China data since the past few days (20 Mar 2020 to 23 Mar 2020). Italy, France, US and India show increasing death toll and we are far from the plateau phase.

Days (Scaled to match exponential infection growth trend in Italy)

The above graph is a one-to-one mapping of the cumulative deaths in the scaled exponential infection growth region. It simply means that we are looking at cumulative deaths when the exponential infections started in each of the countries. France and the US have a very close trend. Maybe in these countries the quality of health care and/or the ability of the people to fight COVID-19 due to inherently present resistance is similar. The US and French data also match up with China on Day-19. But, based on the quantum of population in China, it seems odd, since both the US and France have a fraction of the Chinese population. The China data is most likely wrong or under-reported. Italian death data breaks away dramatically from the US and France data by about Day-12. Either the health care in Italy is not at par with that of the US and France, or that the aged population in Italy has succumbed dramatically to COVID-19. The latter is probably more true, due to numerous reports in the media that age in Italy has been a factor for deaths.

India has just 2 data points in this graph. It is to be seen whether it will follow the Italian curve or the US/French curve.

For the same reasons as that of infections, discussed above, the population normalized data for deaths, taking Italy as the reference data set, yields the following graph.

Days (Scaled to match exponential infection growth of trend in Italy)

As before, Day-27 in this estimation falls on different calendar dates for each of the countries, due to the scaling mentioned above. For simplicity, the dates for Day-27 and the number of infections are reported here, which simply is an extract from the above graph.

CountryBy Date# Deaths
China19 Feb 2020143,899
France02 Apr 20206,653
US04 Apr 202033,026
Italy23 Mar 20206,077
India19 Apr 2020137,138

Again, as on 19 Feb 2020 the deaths in China is estimated to be 143,899, but the actual data in the John Hopkins data set is 2,116. This is a huge discrepancy. However, as explained in the infection section above, we have to take the actual data with a pinch of salt. The 6,077 deaths in Italy as on 23 Mar 2020 is the actual figure in the data set. For France and US, we can check the actual data with the prediction in the first week of April and confirm the efficacy of the model presented here. The actual data for India from the data set as on 23 Mar 2020 with only 10 deaths is difficult to believe. It does not correlate with the data from the other countries mentioned here. If the Italian model is considered a good bench mark, then we should see about 137,000 deaths in India by 19 April 2020, or by the third week of April 2020.

When will the Deaths stop in India due to COVID-19?

The death data for France, Italy and the US show no asymptote as of 23 Mar 2020. The China data however is showing an asymptote as of 23 Mar 2020. So, if we were to go by the China data, where the exponential growth in infections started around 22 Jan 2020, it’s taken about 60 days for the plateau in deaths. However, the China data is questionable, and the onset of infection might have started in Nov 2019 or Dec 2019. So, the asymptote in deaths can be anywhere from 60, 90 or 120 days from the onset of actual exponential growth. The big question is – whether the exponential growth of infection in China started around 22 Jan 2020 or 22 Dec 2019 or 22 Nov 2019? Based on these timelines, we can make an estimate of the plateau in deaths in these countries, from the onset point of exponential growth.

CountryDay-160 Days90 Days120 Days
France3 Mar 20204 May 20203 Jun 20203 July 2020
US7 Mar 20206 May 20205 Jun 20205 July 2020
Italy26 Feb 202026 Apr 202026 May 202025 Jun 2020
India22 Mar 202021 May 202020 Jun 202020 Jul 2020

Estimated Dates for Death Asymptotes

For India, the deaths due to COVID-19 should plateau out by 21 May 2020 or 20 Jun 2020 or 20 Jul 2020. This is based on the assumption that the physical measures of lockdown are similar to what was done in China and probably that in Italy, France and the US. If the quality of lockdown is not on par with these countries, the plateau might happen much down the line.

Factors for lower incidence of Infections and Deaths in India?

As mentioned before, I hope I’m really wrong with the mathematical model and analysis. There are many factors that can increase or decrease the incidence of infections and deaths. It is wishful thinking that the data for India would be magically lower than the raw data trend or the estimations here.

For one, it can be much lower than my estimations, if my simple mathematical model is outright wrong.

If the social distancing and medical care in India is maintained at par with the norms in France, Italy and the US, we can at least expect that the deaths would be on par with these countries, but of course scaled for the Indian population. In this scenario, the deaths should parallel the model presented here.

Another possibility is that a fantastic cocktail of medicines is discovered, possibly from the existing repertoire of medicines, so that the death rate can be dramatically lowered.

A lowering is also possible if a vaccine was discovered for COVID-19 and the entire Indian population is vaccinated in the next 30 days. But, this is a remote possibility.

If the Indian people have a magical immunity to COVID-19, then too we could see a lowering in the death rates. But, as of now, there is no evidence to support this theory.

Closing Thoughts …

The COVID-19 is a surprise to our world. We neither have proven medical treatment nor a vaccine. The only thing we know for sure is that it can spread exponentially and that it causes massive deaths.

There will be nation wide lockdowns, financial strains and stresses in basic necessities of life – food, clothing and shelter. This virus needs both the Government and People to cooperate. We will see either the best in humanity or the worst. It will be a reasonable test for the survivability of the human species. But, the species will definitely make it through, though the losses will be heavy.

It is what it is. The COVID-19 virus is not a living thing. It is just a bunch of molecules, with a protective protein shell and RNA (ribonucleic acid) inside the core (of course there are receptors and other goodies). Without the human cell, it cannot do anything – it cannot duplicate. Once inside, it enters a cell and hijacks the cell machinery, asking the cell to make copies of itself, rather than do the job of the cell. If humans can make semiconductor chips and spacecrafts to fly out into different worlds, they will find a way to stop COVID-19. I’m going to bank on the dedicated scientists to get it done and that too quickly.