It has been previously reasoned out that the data for COVID-19 inFections in India is too low and incorrect, due to the very low actual number of testings. Therefore, the actual number of inFection cases in the MoHFW data and in the above map is misleading and incorrect. It is too low. The real figures will be a big multiple of it. We just don’t know the real extent. It can only be estimated through mathematical models.
However, the map gives a realistic idea of the geographical spread of COVID-19 in India. It should caution anyone that it has reached close to home.
The lockdown and safety protocols stipulated by the Government of India are very important. It has to be followed and respected. It is our best defense for survival. I support it.
The RED line for India is proof that the deaths in India due to COVID-19 are at par with global trends for other countries. Taking the day-to-day data for cumulative Deaths per 1000 InFections (measured real data), India is showing a trend that is higher than that for Germany, South Korea, Australia, Canada, China and the US. It is lower compared to France, UK, Spain and Italy. So, it would be wrong to conclude that India has lower deaths than other countries due to inbuilt immunity, BCG vaccine or geographic temperature.
Background
I have brilliant friends around the world who have been thinking about the impact of COVID-19 on India. Initially, we debated on the low number of inFections in India. Eventually we all agreed that the published data on inFections in India is incorrect, since the number of tests/million is very low. No dispute here.
However, questions started pouring in that if the detected inFections are low in India, the data should at least show up in the death statistics. The logic was that there is bound to be lot more undetected inFections in the country and some of these folks would get really sick and would have to eventually reach a hospital for care. Sadly, some of them would succumb and this should show up in the mortality data.
As on 11 Apr 2020, the published data from JHUM (John Hopkins University and Medicine) for India indicated 288 deaths. This is a rather low number for the second most populous country in the world.
We examined many theories floating around. It was speculated that the Indian people had fantastic immunity. Some speculated that Indian’s had resistance because of the BCG vaccine, given for tuberculosis (TB), since it is part of the childhood immunization program. Others felt that it was because of the higher temperature in India, especially due to the onset of summer.
Realization
One needs to examine data carefully before drawing conclusions.
As on date (11 Apr 2020) it is tempting to compare the 288 deaths in India (1366 million people) with that in other countries. For example, Italy (61 million people) has 19,648 deaths. But, keep in mind that the exponential growth of inFections in India stared on 22 Mar 2020 (Day-1 for India), while in Italy it started on 26 Feb 2020 (Day-1 for Italy). We are off by a month, if we just look at the data based on the calendar. I have found that when inFections reach about 400 in any country, the exponential growth phase begins. So, taking that point as Day-1, we can transform all the country data to a common starting origin. We can then begin to compare for an equal time lapse.
In addition, we have to remove errors in sampling. An effective way to examine the data is to calculate the day-to-day metric of “measured deaths per 1000 inFections”. By doing this we can remove some of the errors in sampling. For example, if we claim that the number of people tested is not sufficient, then there is a sampling error in the inFection data. But, if we take the ratio of such inFection data and match it with measured deaths, the errors would mathematically cancel out (at least to a large extent). Implicit in this argument is the fact that measured deaths will arise from measured inFections.
Therefore, the graph at the very top, is quite stunning in its result. It visually shows that India is following the death trend for other countries. It is somewhere in the median position. Its trend is actually higher than that for Germany, South Korea, Australia, Canada, China and the US. It is lower compared to France, UK, Spain and Italy. Variations in the relative positions in the graph can be attributed to the accessibility and quality of health care in a given country.
The graph also reveals another startling fact. It is telling us that the death/inFection ratio can skyrocket all of a sudden. For example, France and India have similar datum values until Day-15. Then France suddenly shoots up. Why is this? I attribute it to the inability of the medical care to cope up with the critical cases.
India has data for 21 days during its exponential growth (22 Mar 2020 to 11 Apr 2020). I have computed the average for the first 21 days for all the countries that I have analyzed. India has an average ratio of 26.37 compared to 22.87, which is the average for all the countries. The graph below shows this visually.
Start of rapid InFection growth -exponential curve. Based on Italian growth model for InFections.
20 Apr 2020 = Day-30
InfLection point Slope of growth curve begins to decrease. Slowdown in InFection
04 May 2020 = Day-45
Start of assymptote (based on trends) InFection new cases begin to rapidly decline.
20 May 2020 = Day-60
Expect a true asymptote. Rapid decline in new InFections. Rate of new InFections tending to zero. Very small number of new cases each day.
Day
Date
Infections (Predicted)
Day-1
22 Mar 2020
Day-10
31 Mar 2020
100,196
Day-15
05 Apr 2020
272,945
Day-20
10 Apr 2020
635,209
Day-25
15 Apr 2020
1,180,269
Day-30
20 Apr 2020
1,824,431
Day-35
25 Apr 2020
2,407,025
The above are Predictions are based on the Italian infection raw data model. For their validity the standards of Lockdown and Social Distancing in India should be at par with that in Italy. The above Predictions would reveal in the testing data for India, provided India does a minimum 1500+ Tests/Million of population. There are also people with infections who never get tested and as such the true infection could be higher than predicted values by a factor of 5 to 10. For example, in Italy the true infections are projected to be 10 times higher than the published tested data.
Published India Infection Data – Paints a Faulty Picture
The published India infection data from the John Hopkins University (JHU) data set is faulty. JHU sources its data from the Government of India portals. The problem with this data is that it is based on 32 Tests/Million people in India. Comparing with other developed countries, the quantum of these tests are so low, that it does not reflect a representative sample of the population. So, the existing data for India, at least from 22 Mar 2020 to 31 Mar 2020, is useless to start building a model for prediction.
Country
Tests/Million
India
32
Italy
8,405
France
1,508
Spain
7,596
Germany
5,812
China
2,820
South Korea
7,940
Australia
9,670
UK
2,120
USA
3,377
Canada
6,450
Italian Infection Model is a Good Benchmark
The infection raw data for Italy, based on my analysis in previous articles, is a good benchmark for predicting the infection trends in other countries. I have found that once the infections reach a value of 400, the trajectory follows an exponential curve. The corresponding date is taken as Day-1 and all countries can start at the same point on a common graph. The analysis has also shown that the exponential curve for Italy can be scaled with PR and MF to model the infection grown curves for other countries.
This graph contains the real day-to-day data of COVID-19 infections for several countries. The PR ratio is shown in the legend. The idea is that if PR is more than 1 the curve should fall below Italy (blue line) and if it is less than 1 it should fall below Italy. US and Germany show this trend (PR > 1). Canada and South Korea are way below (PR < 1). England and France are close to Italy (PR close to 1). China is an anomaly, since it is hovering close to Italy, though its PR is 23.68! Thats because China’s data is tainted. India’s data is also tainted, since it is falling way below the curve for Italy. In conclusion the Italian curve is a good bench mark.
It must be noted that to bank on this model, the remedial measures such as lockdown and quarantine norms must be similar in all the countries. Otherwise there will be variations. For this reason, Spain is higher than it should be for its PR value. It is likely that Spain’s lockdown was not as effective as that of Italy.
Predicting the Infections based on Italian Model
Based on the above reasoning, using the Italian curve, other countries can be modeled by multiplying the Italian infection raw data with the PR.
First, a polynomial was established to fit the Italian infection raw data.
Here, Y(x) is the Predicted Infection on day ‘x’, where ‘x’ is the same as time ‘t’ in days, with x = 1 and incrementing by 1 for each day.
Multiplying this by PR the individual curves for other countries are obtained.
The graph is not detailed for most of the countries, since the scale is enlarged because of China and India. Removing these, we get a much clearer graph for the other countries.
Point of InfLection based on Real Data best fit ##
09 Apr 2020 = Day-34
Day-15
21 Mar 2020
DAY-16
22 Mar 2020
Day-23
29 Mar 2020
Model Validity with Real Data: Day-1 to Day-15
VALID for MF=0.30
Model Validity with Real Data: Day-16 to Day 23
VALID for MF=0.6
Infections at Day-30 (05 Apr 2020)– PREDICTED*
532,511
Infections on 05 Apr 2020 (Real Measured Data)*
337,072
Infections at Day-45 (20 April 2020) – PREDICTED#
8,566,413
Infections on 20 Apr 2020 (Real Measured Data)*
784,326
* & # Real Measured Data for COVID-19 infections in the US are lower than Predicted Data as per the model for Day-30, by 37%. This is because the model deviates from real world data after 29 Mar 2020. The tested cases are only a subset of the population. Therefore, to obtain the infections in the entire population, we can extrapolate the tested data to the whole population by a multiple.
## InfLection is at Day=40, which falls on 15 Apr 2020. As on this this day the rate of inFection growth has definitely slowed down.
Italy
Day-1 of Exponential Growth Phase
26 Feb 2020 = Day-1
Model Factor (MF)
1.00
Point of InfLection based on Real Data best fit ##
27 Mar 2020 = Day-31
Model Validity with Real Data from Day-1 to
18 Mar 2020 = Day-22
Infections at Day-30 (26 Mar 2020)– PREDICTED*
163,309
Infections on 26 Mar 2020 (Real Measured Data)*
80,589
Infections at Day-45 (4 April 2020) – PREDICTED#
2,627,126
Infections on 04 Apr 2020 (Real Measured Data)#
1,24,632
Asymptote based on Real Data – PREDICTED ###
08 Apr 2020 = Day-43
* & # Real Measured Data for COVID-19 infections in Italy are lower than Predicted Data as per the model for Day-30 and Day-45. This is because the model deviates from real world data after 18 Mar 2020. The tested cases are only a subset of the population. Therefore, to obtain the infections in the entire population, we can extrapolate the tested data to the whole population by a multiple.
## InfLection is at Day=31, which falls on 27 Mar 2020. As on this this day the rate of inFection growth has definitely slowed down. Based on this, other countries may also show a slow in growth at about 30 days, provided the steps taken there are similar to the protocols taken in Italy.
### Asymptote is the point where the rate of new InFections drops dramatically, in mathematical terms to zero. However, in the case of real world inFections, the numbers of new inFections would be small compared to the previous days. This is the point when the curve becomes flat.
UK
Day-1 of Exponential Growth Phase
10 Mar 2020 = Day-1
Model Factor (MF)
0.70
Point of InfLection based on Real Data best fit ## (Day-37 = 15 Apr 2020 – PREDICTED)
15 Apr 2020
Day-17
26 Mar 2020
Model Validity with Real Data from Day-1 to Day-17
VALID
Infections at Day-30 (08 Apr 2020)– PREDICTED*
127,495
Infections on 08 Apr 2020 (Real Measured Data)*
60,733
Infections at Day-45 (23 April 2020) – PREDICTED#
2,050,983
* & # Real Measured Data for COVID-19 infections in the UK are lower than Predicted Data as per the model for Day-30, by 52%. This is because the model deviates from real world data after 26 Mar 2020. The tested cases are only a subset of the population. Therefore, to obtain the infections in the entire population, we can extrapolate the tested data to the whole population by a multiple.
## InfLection for UK is PREDICTED on 15 Apr 2020 (as per data available on 13 Apr 2020). That’s a total of 37 days from the start of exponential growth.
France
Day-1 of Exponential Growth Phase
05 Mar 2020 = Day-1
Model Factor (MF)
0.80
Point of InfLection based on Real Data best fit
07 Apr 2020 = Day-34
Day-21
25 Mar 2020
Model Validity with Real Data from Day-1 to Day-21
VALID
Infections at Day-30 (03 Apr 2020)– PREDICTED*
140,529
Infections on 03 Apr 2020 (Real Measured Data)*
64,338
Infections at Day-45 (18 April 2020) – PREDICTED#
2,260,661
* & # Real Measured Data for COVID-19 infections in France are lower than Predicted Data as per the model for Day-30, by 54%. This is because the model deviates from real world data after 25 Mar 2020. The tested cases are only a subset of the population. Therefore, to obtain the infections in the entire population, we can extrapolate the tested data to the whole population by a multiple.
## InfLection is at Day=34, which falls on 07 Apr 2020. As on this this day the rate of inFection growth has definitely slowed down.
Spain
Day-1 of Exponential Growth Phase
06 Mar 2020 = Day-1
Model Factor (MF)
2.5
Point of InfLection based on Real Data best fit ##
31 Mar 2020 = Day-26
Model Validity with Real Data from Day-1 to
26 Mar 2020 = Day-21
Infections at Day-30 (04 Apr 2020)– PREDICTED*
315,134
Infections on 04 Apr 2020 (Real Measured Data)*
126,168
Infections at Day-45 (19 April 2020) – PREDICTED#
5,069,498
* & # Real Measured Data for COVID-19 infections in Spain are lower than Predicted Data as per the model for Day-30, by 60%. This is because the model deviates from real world data after 26 Mar 2020. The tested cases are only a subset of the population. Therefore, to obtain the infections in the entire population, we can extrapolate the tested data to the whole population by a multiple.
## InfLection is at Day=26, which falls on 31 Mar 2020. As on this this day the rate of inFection growth has definitely slowed down.
Germany
Day-1 of Exponential Growth Phase
05 Mar 2020 = Day-1
Model Factor (MF)
0.9
Point of InfLection based on Real Data best fit ##
01 Apr 2020 = Day-28
Day-22
26 Mar 2020
Model Validity with Real Data from Day-1 to
26 Mar 2020 – Day-22
Infections at Day-30 (03 Apr 2020)– PREDICTED*
202,728
Infections on 03 Apr 2020 (Real Measured Data)*
91,159
Infections at Day-45 (19 April 2020) – PREDICTED#
3,261,248
* & # Real Measured Data for COVID-19 infections in Spain are lower than Predicted Data as per the model for Day-30, by 55%. This is because the model deviates from real world data after 26 Mar 2020. The tested cases are only a subset of the population. Therefore, to obtain the infections in the entire population, we can extrapolate the tested data to the whole population by a multiple.
## InfLection is at Day=28, which falls on 01 Apr 2020. As on this this day the rate of inFection growth has definitely slowed down.
Canada
Day-1 of Exponential Growth Phase
16 Mar 2020 = Day-1
Model Factor (MF)
1.25
Point of InfLection based on Real Data best fit ##
08 Apr 2020 = Day-24
Day-11
26 Mar 2020
Model Validity with Real Data from Day-1 to Day-11
VALID
Infections at Day-30 (14 Apr 2020)– PREDICTED*
126,126
Infections on 14 Apr 2020 (Real Measured Data)*
NOT YET THERE!
Infections at Day-45 (29 April 2020) – PREDICTED#
2,028,971
* & # Real Measured Data for COVID-19 infections in Canada will be lower than Predicted Data as per the model for Day-30. This is because the model deviates from real world data after 26 Mar 2020. The tested cases are only a subset of the population. Therefore, to obtain the infections in the entire population, we can extrapolate the tested data to the whole population by a multiple.
## InfLection is at Day=24, which falls on 08 Apr 2020. As on this this day the rate of inFection growth has definitely slowed down. Canada has carried out 8,732 tests/million.
Australia
Day-1 of Exponential Growth Phase
16 Mar 2020 = Day-1
Model Factor (MF)
1.40
Point of InfLection based on Real Data best fit ##
28 Mar 2020 = Day-13
Model Validity with Real Data from Day-1 to
26 Mar 2020 = Day-11
Infections at Day-30 (14 Apr 2020)– PREDICTED*
95,166
Infections on 14 Apr 2020 (Real Measured Data)*
NOT YET THERE!
Infections at Day-45 (29 April 2020) – PREDICTED#
1,530,911
* & # Real Measured Data for COVID-19 infections in Australia will be lower than Predicted Data as per the model for Day-30. This is because the model deviates from real world data after 26 Mar 2020. The tested cases are only a subset of the population. Therefore, to obtain the infections in the entire population, we can extrapolate the tested data to the whole population by a multiple.
## InfLection is at Day=13, which falls on 28 Mar 2020. As on this this day the rate of inFection growth has definitely slowed down. The infLection point for Australia is much faster than for Italy, Spain and Germany.
China
Day-1 of Exponential Growth Phase
22 Jan 2020 = Day-1
Model Factor (MF)
0.1
Point of InfLection based on Real Data best fit ##
12 Feb 2020 = Day-22
Model Validity with Real Data from Day-1 to
10 Feb 2020 = Day-20
Infections at Day-30 (20 Feb 2020)– PREDICTED*
386,705
Infections on 20 Feb 2020 (Real Measured Data)*
75,077
Infections at Day-45 (06 Mar 2020) – PREDICTED#
6,220,851
Infections on 06 Mar 2020 (Real Measured Data)#
80,690
Start of Asymptote (Day-46) – Real Data
07 Mar 2020
Current Asymptote Value – Real Data as on 26 Mar 2020 (Day-65)
81,782
* The Day-30 Predicted Data is higher that the Real Measured Data by a factor of 5.
# Day-45 Predicted Data is orders of magnitude than the Real Measured data.
## InfLection is at Day=22, which falls on 02 Feb 2020. As on this this day the rate of inFection growth has definitely slowed down. The infLection point for China is in the same ball park for Italy (Day-30), Spain (Day-27), and Germany (Day-28).
Global estimates of infections as on 26 Mar 2020 was 566,269.
China’s reported infection data for 26 Mar 2020 is 81,792. Out of this, Hubei Province (of which Wuhan is the Capital) has 67,801 reported infections – which is 83% of China’s number. Though infections had spread to 31 other Provinces, they account for only 17% of infections. China’s numbers begin to asymptote around 17 Feb 2020 (Day-27) at reported 72,434 cumulative infections.
China must have been brilliant at curtailing the spread that began the exponential growth phase on 22 Jan 2020, had a point of inflection on 09 Feb 2020 (Day-19) and started the asymptote on 17 Feb 2020 (Day-27). Additionally they supposedly curtailed the spread in other Provinces, but left gaps in the control to spread COVID-19 to 199 Countries and Territories. This very difficult to accept!
Another way to look at it is that China was done with the worst in 27 Days. If you see the country models above, none of the major countries are anywhere near the begin of an asymptote as on 26 Mar 2020 – USA (Day-20), Italy (Day-24), UK (Day-17), France (Day-22), Spain (Day-21), Germany (Day-22), Canada (Day-11), Australia (Day-11).
Finally, China has a population of 1400 million and the above mentioned countries have populations in the range 35 to 65 million. So, we would expect China to have been much worse.
Therfore I will leave the reader to conclude whether we should trust the Prediction or the Real Measured Data for China. My personal opinion is that China’s reported data is a suspect.
South Korea
Day-1 of Exponential Growth Phase
22 Feb 2020 = Day-1
Model Factor (MF)
1.0
Point of InfLection based on Real Data best fit ##
15 Mar 2020 = Day-23
Model Validity with Real Data from Day-1 to
05 Mar 2020 = Day-13
Infections at Day-30 (22 Mar 2020)– PREDICTED*
186,496
Infections on 22 Mar 2020 (Real Measured Data)*
8,961
Infections at Day-45 (06 Apr 2020) – PREDICTED#
3,485,646
Infections on 06 Apr 2020 (Real Measured Data)#
NOT YET THERE
Current Asymptote Value – Real Data as on 26 Mar 2020 (Day-34)
9,137
* The Day-30 Predicted Data and Real Measured Data vary by a huge factor. Since the model is anyway NOT valid beyond Day-13, Predicted Data should be discarded.
# The Predicted Data for Day-45 is a very big number, compared to the Real Measured Data of 9,137 as on 26 Mar 2020. Again, since the model is anyway NOT valid beyond Day-13, Predicted Data should be discarded.
## InfLection is at Day=23, which falls on 15 Mar 2020. As on this this day the rate of inFection growth has definitely slowed down. The infLection point for South Korea is in the same ball park for Italy (Day-30), Spain (Day-27), Germany (Day-28) and China (Day-22).
However, in the case of South Korea, we should probably accept that the Inflection at Day-23 (15 Mar 2020) is real for the very reason that it succeeded in stoping COVID-19 in its tracks. Rather than repeat what is already out there, please see the following article. Further, post Day-23, the curve has begun to asymptote, though not completely reaching a level of zero slope.
South Korea is the only country so far, where the virus was curtailed by targeted lockdowns through the use of technology and the use of best practices via TRACING, TESTING and QUARANTINING. It is also the only country that achieved this without total lockdown.
South Korea is a democratic country. It has a population of about 51 million. It has a testing ratio of 7,502 Tests/Million, which is higher than that for even Italy (6,533 Tests/Million). This data is getting higher and higher each day. Its data and publication methods have been transparent and at par with international norms for reporting.
It is quite evident that South Korea had reached a true point of inflection (second derivate equal to zero) and thereby curtailed COVID-19.
India
Day-1 of Exponential Growth Phase
22 Mar 2020 = Day-1
Model Factor (MF)
TBD
Point of InFlection based on Real Data best fit
NOT YET
Day-X : End Date for Model Validity
TBD
Model Validity with Real Data from Day-1 to Day-X
NEED MORE DATA
Infections at Day-30 (20 Apr 2020)– PREDICTED*
TBD
Infections on 20 Apr 2020 (Real Measured Data)*
NOT YET THERE!
Infections at Day-45 (05 May 2020) – PREDICTED#
TBD
* # The data for India from 22 Mar 2020 to 05 Apr 2020 is spurious. It is far removed from the truth. It is because of insufficient testing in the country. This gives an artificially low number for inFections. The graph, with MF=0.04, is the Italian trend for inFections. The India data is 56% lower than Italy, for the same time period of exponential growth. But, Indian population is 22.56 times the Italian population.
In my previous article I had estimated that 1.44 million people will be infected with COVID-19 by 19 April 2020, but that was based on simple scaling of the Italian Raw Data for population difference between India and Italy.
The quantum of actual testing in India is very low and this may limit the quality of a mathematical model, since the factor MF will not be well known. India is currently doing only 66 Tests/Million people, as on 05 Apr 2020. It is totally inadequate to get the ground level reality. In comparison the testing in other other countries is way higher – 10,896 (Italy); 8,920 (South Korea); 5,421 (US) and 2,895 (UK). This is the reason why it was possible for me to establish a mathematical exponential predictive equation for these countries, but not for India.
As such, until the actual testing data improves in India, it may be difficult to predict the infections for the next 15 to 30 days. The only alternative is to predict the number as reported in my previous article.
Methodology – Model Equation and Real World Infection Data
The model has been developed based on the Italian trend for infection spread of COVID-19 (Coronavirus 2019). In a previous article, I had established the equation for the exponential growth model for Italy. This equation was modified for other countries based on a scaling Model Factor (MF) and their population, via a Population Ratio (PR) defined as the “population of a country” divided by “population of Italy”. I’ve used the population data from https://worldpopulationreview.com. In a nutshell the equation takes the form:
Y(t) = MF * PR * 631.06 * exp (0.1852*t)
Y(t) is the “Cumulative Number of Infections” on day “t”. The model equation works in the exponential growth phase for the spread of the virus infection. The start of the exponential growth phase is taken when the cumulative number of infections has reached about 400, based on actual published test raw data, since it works well with the equation for many countries. When this happens, I’ve designated it as Day-1 of the exponential growth phase and t is set to 1 (t=1). For succeeding days, “t” is incremented by one.