It has been previously reasoned out that the data for COVID-19 inFections in India is too low and incorrect, due to the very low actual number of testings. Therefore, the actual number of inFection cases in the MoHFW data and in the above map is misleading and incorrect. It is too low. The real figures will be a big multiple of it. We just don’t know the real extent. It can only be estimated through mathematical models.
However, the map gives a realistic idea of the geographical spread of COVID-19 in India. It should caution anyone that it has reached close to home.
The lockdown and safety protocols stipulated by the Government of India are very important. It has to be followed and respected. It is our best defense for survival. I support it.
Start of rapid InFection growth -exponential curve. Based on Italian growth model for InFections.
20 Apr 2020 = Day-30
InfLection point Slope of growth curve begins to decrease. Slowdown in InFection
04 May 2020 = Day-45
Start of assymptote (based on trends) InFection new cases begin to rapidly decline.
20 May 2020 = Day-60
Expect a true asymptote. Rapid decline in new InFections. Rate of new InFections tending to zero. Very small number of new cases each day.
Day
Date
Infections (Predicted)
Day-1
22 Mar 2020
Day-10
31 Mar 2020
100,196
Day-15
05 Apr 2020
272,945
Day-20
10 Apr 2020
635,209
Day-25
15 Apr 2020
1,180,269
Day-30
20 Apr 2020
1,824,431
Day-35
25 Apr 2020
2,407,025
The above are Predictions are based on the Italian infection raw data model. For their validity the standards of Lockdown and Social Distancing in India should be at par with that in Italy. The above Predictions would reveal in the testing data for India, provided India does a minimum 1500+ Tests/Million of population. There are also people with infections who never get tested and as such the true infection could be higher than predicted values by a factor of 5 to 10. For example, in Italy the true infections are projected to be 10 times higher than the published tested data.
Published India Infection Data – Paints a Faulty Picture
The published India infection data from the John Hopkins University (JHU) data set is faulty. JHU sources its data from the Government of India portals. The problem with this data is that it is based on 32 Tests/Million people in India. Comparing with other developed countries, the quantum of these tests are so low, that it does not reflect a representative sample of the population. So, the existing data for India, at least from 22 Mar 2020 to 31 Mar 2020, is useless to start building a model for prediction.
Country
Tests/Million
India
32
Italy
8,405
France
1,508
Spain
7,596
Germany
5,812
China
2,820
South Korea
7,940
Australia
9,670
UK
2,120
USA
3,377
Canada
6,450
Italian Infection Model is a Good Benchmark
The infection raw data for Italy, based on my analysis in previous articles, is a good benchmark for predicting the infection trends in other countries. I have found that once the infections reach a value of 400, the trajectory follows an exponential curve. The corresponding date is taken as Day-1 and all countries can start at the same point on a common graph. The analysis has also shown that the exponential curve for Italy can be scaled with PR and MF to model the infection grown curves for other countries.
This graph contains the real day-to-day data of COVID-19 infections for several countries. The PR ratio is shown in the legend. The idea is that if PR is more than 1 the curve should fall below Italy (blue line) and if it is less than 1 it should fall below Italy. US and Germany show this trend (PR > 1). Canada and South Korea are way below (PR < 1). England and France are close to Italy (PR close to 1). China is an anomaly, since it is hovering close to Italy, though its PR is 23.68! Thats because China’s data is tainted. India’s data is also tainted, since it is falling way below the curve for Italy. In conclusion the Italian curve is a good bench mark.
It must be noted that to bank on this model, the remedial measures such as lockdown and quarantine norms must be similar in all the countries. Otherwise there will be variations. For this reason, Spain is higher than it should be for its PR value. It is likely that Spain’s lockdown was not as effective as that of Italy.
Predicting the Infections based on Italian Model
Based on the above reasoning, using the Italian curve, other countries can be modeled by multiplying the Italian infection raw data with the PR.
First, a polynomial was established to fit the Italian infection raw data.
Here, Y(x) is the Predicted Infection on day ‘x’, where ‘x’ is the same as time ‘t’ in days, with x = 1 and incrementing by 1 for each day.
Multiplying this by PR the individual curves for other countries are obtained.
The graph is not detailed for most of the countries, since the scale is enlarged because of China and India. Removing these, we get a much clearer graph for the other countries.
Point of InfLection based on Real Data best fit ##
09 Apr 2020 = Day-34
Day-15
21 Mar 2020
DAY-16
22 Mar 2020
Day-23
29 Mar 2020
Model Validity with Real Data: Day-1 to Day-15
VALID for MF=0.30
Model Validity with Real Data: Day-16 to Day 23
VALID for MF=0.6
Infections at Day-30 (05 Apr 2020)– PREDICTED*
532,511
Infections on 05 Apr 2020 (Real Measured Data)*
337,072
Infections at Day-45 (20 April 2020) – PREDICTED#
8,566,413
Infections on 20 Apr 2020 (Real Measured Data)*
784,326
* & # Real Measured Data for COVID-19 infections in the US are lower than Predicted Data as per the model for Day-30, by 37%. This is because the model deviates from real world data after 29 Mar 2020. The tested cases are only a subset of the population. Therefore, to obtain the infections in the entire population, we can extrapolate the tested data to the whole population by a multiple.
## InfLection is at Day=40, which falls on 15 Apr 2020. As on this this day the rate of inFection growth has definitely slowed down.
Italy
Day-1 of Exponential Growth Phase
26 Feb 2020 = Day-1
Model Factor (MF)
1.00
Point of InfLection based on Real Data best fit ##
27 Mar 2020 = Day-31
Model Validity with Real Data from Day-1 to
18 Mar 2020 = Day-22
Infections at Day-30 (26 Mar 2020)– PREDICTED*
163,309
Infections on 26 Mar 2020 (Real Measured Data)*
80,589
Infections at Day-45 (4 April 2020) – PREDICTED#
2,627,126
Infections on 04 Apr 2020 (Real Measured Data)#
1,24,632
Asymptote based on Real Data – PREDICTED ###
08 Apr 2020 = Day-43
* & # Real Measured Data for COVID-19 infections in Italy are lower than Predicted Data as per the model for Day-30 and Day-45. This is because the model deviates from real world data after 18 Mar 2020. The tested cases are only a subset of the population. Therefore, to obtain the infections in the entire population, we can extrapolate the tested data to the whole population by a multiple.
## InfLection is at Day=31, which falls on 27 Mar 2020. As on this this day the rate of inFection growth has definitely slowed down. Based on this, other countries may also show a slow in growth at about 30 days, provided the steps taken there are similar to the protocols taken in Italy.
### Asymptote is the point where the rate of new InFections drops dramatically, in mathematical terms to zero. However, in the case of real world inFections, the numbers of new inFections would be small compared to the previous days. This is the point when the curve becomes flat.
UK
Day-1 of Exponential Growth Phase
10 Mar 2020 = Day-1
Model Factor (MF)
0.70
Point of InfLection based on Real Data best fit ## (Day-37 = 15 Apr 2020 – PREDICTED)
15 Apr 2020
Day-17
26 Mar 2020
Model Validity with Real Data from Day-1 to Day-17
VALID
Infections at Day-30 (08 Apr 2020)– PREDICTED*
127,495
Infections on 08 Apr 2020 (Real Measured Data)*
60,733
Infections at Day-45 (23 April 2020) – PREDICTED#
2,050,983
* & # Real Measured Data for COVID-19 infections in the UK are lower than Predicted Data as per the model for Day-30, by 52%. This is because the model deviates from real world data after 26 Mar 2020. The tested cases are only a subset of the population. Therefore, to obtain the infections in the entire population, we can extrapolate the tested data to the whole population by a multiple.
## InfLection for UK is PREDICTED on 15 Apr 2020 (as per data available on 13 Apr 2020). That’s a total of 37 days from the start of exponential growth.
France
Day-1 of Exponential Growth Phase
05 Mar 2020 = Day-1
Model Factor (MF)
0.80
Point of InfLection based on Real Data best fit
07 Apr 2020 = Day-34
Day-21
25 Mar 2020
Model Validity with Real Data from Day-1 to Day-21
VALID
Infections at Day-30 (03 Apr 2020)– PREDICTED*
140,529
Infections on 03 Apr 2020 (Real Measured Data)*
64,338
Infections at Day-45 (18 April 2020) – PREDICTED#
2,260,661
* & # Real Measured Data for COVID-19 infections in France are lower than Predicted Data as per the model for Day-30, by 54%. This is because the model deviates from real world data after 25 Mar 2020. The tested cases are only a subset of the population. Therefore, to obtain the infections in the entire population, we can extrapolate the tested data to the whole population by a multiple.
## InfLection is at Day=34, which falls on 07 Apr 2020. As on this this day the rate of inFection growth has definitely slowed down.
Spain
Day-1 of Exponential Growth Phase
06 Mar 2020 = Day-1
Model Factor (MF)
2.5
Point of InfLection based on Real Data best fit ##
31 Mar 2020 = Day-26
Model Validity with Real Data from Day-1 to
26 Mar 2020 = Day-21
Infections at Day-30 (04 Apr 2020)– PREDICTED*
315,134
Infections on 04 Apr 2020 (Real Measured Data)*
126,168
Infections at Day-45 (19 April 2020) – PREDICTED#
5,069,498
* & # Real Measured Data for COVID-19 infections in Spain are lower than Predicted Data as per the model for Day-30, by 60%. This is because the model deviates from real world data after 26 Mar 2020. The tested cases are only a subset of the population. Therefore, to obtain the infections in the entire population, we can extrapolate the tested data to the whole population by a multiple.
## InfLection is at Day=26, which falls on 31 Mar 2020. As on this this day the rate of inFection growth has definitely slowed down.
Germany
Day-1 of Exponential Growth Phase
05 Mar 2020 = Day-1
Model Factor (MF)
0.9
Point of InfLection based on Real Data best fit ##
01 Apr 2020 = Day-28
Day-22
26 Mar 2020
Model Validity with Real Data from Day-1 to
26 Mar 2020 – Day-22
Infections at Day-30 (03 Apr 2020)– PREDICTED*
202,728
Infections on 03 Apr 2020 (Real Measured Data)*
91,159
Infections at Day-45 (19 April 2020) – PREDICTED#
3,261,248
* & # Real Measured Data for COVID-19 infections in Spain are lower than Predicted Data as per the model for Day-30, by 55%. This is because the model deviates from real world data after 26 Mar 2020. The tested cases are only a subset of the population. Therefore, to obtain the infections in the entire population, we can extrapolate the tested data to the whole population by a multiple.
## InfLection is at Day=28, which falls on 01 Apr 2020. As on this this day the rate of inFection growth has definitely slowed down.
Canada
Day-1 of Exponential Growth Phase
16 Mar 2020 = Day-1
Model Factor (MF)
1.25
Point of InfLection based on Real Data best fit ##
08 Apr 2020 = Day-24
Day-11
26 Mar 2020
Model Validity with Real Data from Day-1 to Day-11
VALID
Infections at Day-30 (14 Apr 2020)– PREDICTED*
126,126
Infections on 14 Apr 2020 (Real Measured Data)*
NOT YET THERE!
Infections at Day-45 (29 April 2020) – PREDICTED#
2,028,971
* & # Real Measured Data for COVID-19 infections in Canada will be lower than Predicted Data as per the model for Day-30. This is because the model deviates from real world data after 26 Mar 2020. The tested cases are only a subset of the population. Therefore, to obtain the infections in the entire population, we can extrapolate the tested data to the whole population by a multiple.
## InfLection is at Day=24, which falls on 08 Apr 2020. As on this this day the rate of inFection growth has definitely slowed down. Canada has carried out 8,732 tests/million.
Australia
Day-1 of Exponential Growth Phase
16 Mar 2020 = Day-1
Model Factor (MF)
1.40
Point of InfLection based on Real Data best fit ##
28 Mar 2020 = Day-13
Model Validity with Real Data from Day-1 to
26 Mar 2020 = Day-11
Infections at Day-30 (14 Apr 2020)– PREDICTED*
95,166
Infections on 14 Apr 2020 (Real Measured Data)*
NOT YET THERE!
Infections at Day-45 (29 April 2020) – PREDICTED#
1,530,911
* & # Real Measured Data for COVID-19 infections in Australia will be lower than Predicted Data as per the model for Day-30. This is because the model deviates from real world data after 26 Mar 2020. The tested cases are only a subset of the population. Therefore, to obtain the infections in the entire population, we can extrapolate the tested data to the whole population by a multiple.
## InfLection is at Day=13, which falls on 28 Mar 2020. As on this this day the rate of inFection growth has definitely slowed down. The infLection point for Australia is much faster than for Italy, Spain and Germany.
China
Day-1 of Exponential Growth Phase
22 Jan 2020 = Day-1
Model Factor (MF)
0.1
Point of InfLection based on Real Data best fit ##
12 Feb 2020 = Day-22
Model Validity with Real Data from Day-1 to
10 Feb 2020 = Day-20
Infections at Day-30 (20 Feb 2020)– PREDICTED*
386,705
Infections on 20 Feb 2020 (Real Measured Data)*
75,077
Infections at Day-45 (06 Mar 2020) – PREDICTED#
6,220,851
Infections on 06 Mar 2020 (Real Measured Data)#
80,690
Start of Asymptote (Day-46) – Real Data
07 Mar 2020
Current Asymptote Value – Real Data as on 26 Mar 2020 (Day-65)
81,782
* The Day-30 Predicted Data is higher that the Real Measured Data by a factor of 5.
# Day-45 Predicted Data is orders of magnitude than the Real Measured data.
## InfLection is at Day=22, which falls on 02 Feb 2020. As on this this day the rate of inFection growth has definitely slowed down. The infLection point for China is in the same ball park for Italy (Day-30), Spain (Day-27), and Germany (Day-28).
Global estimates of infections as on 26 Mar 2020 was 566,269.
China’s reported infection data for 26 Mar 2020 is 81,792. Out of this, Hubei Province (of which Wuhan is the Capital) has 67,801 reported infections – which is 83% of China’s number. Though infections had spread to 31 other Provinces, they account for only 17% of infections. China’s numbers begin to asymptote around 17 Feb 2020 (Day-27) at reported 72,434 cumulative infections.
China must have been brilliant at curtailing the spread that began the exponential growth phase on 22 Jan 2020, had a point of inflection on 09 Feb 2020 (Day-19) and started the asymptote on 17 Feb 2020 (Day-27). Additionally they supposedly curtailed the spread in other Provinces, but left gaps in the control to spread COVID-19 to 199 Countries and Territories. This very difficult to accept!
Another way to look at it is that China was done with the worst in 27 Days. If you see the country models above, none of the major countries are anywhere near the begin of an asymptote as on 26 Mar 2020 – USA (Day-20), Italy (Day-24), UK (Day-17), France (Day-22), Spain (Day-21), Germany (Day-22), Canada (Day-11), Australia (Day-11).
Finally, China has a population of 1400 million and the above mentioned countries have populations in the range 35 to 65 million. So, we would expect China to have been much worse.
Therfore I will leave the reader to conclude whether we should trust the Prediction or the Real Measured Data for China. My personal opinion is that China’s reported data is a suspect.
South Korea
Day-1 of Exponential Growth Phase
22 Feb 2020 = Day-1
Model Factor (MF)
1.0
Point of InfLection based on Real Data best fit ##
15 Mar 2020 = Day-23
Model Validity with Real Data from Day-1 to
05 Mar 2020 = Day-13
Infections at Day-30 (22 Mar 2020)– PREDICTED*
186,496
Infections on 22 Mar 2020 (Real Measured Data)*
8,961
Infections at Day-45 (06 Apr 2020) – PREDICTED#
3,485,646
Infections on 06 Apr 2020 (Real Measured Data)#
NOT YET THERE
Current Asymptote Value – Real Data as on 26 Mar 2020 (Day-34)
9,137
* The Day-30 Predicted Data and Real Measured Data vary by a huge factor. Since the model is anyway NOT valid beyond Day-13, Predicted Data should be discarded.
# The Predicted Data for Day-45 is a very big number, compared to the Real Measured Data of 9,137 as on 26 Mar 2020. Again, since the model is anyway NOT valid beyond Day-13, Predicted Data should be discarded.
## InfLection is at Day=23, which falls on 15 Mar 2020. As on this this day the rate of inFection growth has definitely slowed down. The infLection point for South Korea is in the same ball park for Italy (Day-30), Spain (Day-27), Germany (Day-28) and China (Day-22).
However, in the case of South Korea, we should probably accept that the Inflection at Day-23 (15 Mar 2020) is real for the very reason that it succeeded in stoping COVID-19 in its tracks. Rather than repeat what is already out there, please see the following article. Further, post Day-23, the curve has begun to asymptote, though not completely reaching a level of zero slope.
South Korea is the only country so far, where the virus was curtailed by targeted lockdowns through the use of technology and the use of best practices via TRACING, TESTING and QUARANTINING. It is also the only country that achieved this without total lockdown.
South Korea is a democratic country. It has a population of about 51 million. It has a testing ratio of 7,502 Tests/Million, which is higher than that for even Italy (6,533 Tests/Million). This data is getting higher and higher each day. Its data and publication methods have been transparent and at par with international norms for reporting.
It is quite evident that South Korea had reached a true point of inflection (second derivate equal to zero) and thereby curtailed COVID-19.
India
Day-1 of Exponential Growth Phase
22 Mar 2020 = Day-1
Model Factor (MF)
TBD
Point of InFlection based on Real Data best fit
NOT YET
Day-X : End Date for Model Validity
TBD
Model Validity with Real Data from Day-1 to Day-X
NEED MORE DATA
Infections at Day-30 (20 Apr 2020)– PREDICTED*
TBD
Infections on 20 Apr 2020 (Real Measured Data)*
NOT YET THERE!
Infections at Day-45 (05 May 2020) – PREDICTED#
TBD
* # The data for India from 22 Mar 2020 to 05 Apr 2020 is spurious. It is far removed from the truth. It is because of insufficient testing in the country. This gives an artificially low number for inFections. The graph, with MF=0.04, is the Italian trend for inFections. The India data is 56% lower than Italy, for the same time period of exponential growth. But, Indian population is 22.56 times the Italian population.
In my previous article I had estimated that 1.44 million people will be infected with COVID-19 by 19 April 2020, but that was based on simple scaling of the Italian Raw Data for population difference between India and Italy.
The quantum of actual testing in India is very low and this may limit the quality of a mathematical model, since the factor MF will not be well known. India is currently doing only 66 Tests/Million people, as on 05 Apr 2020. It is totally inadequate to get the ground level reality. In comparison the testing in other other countries is way higher – 10,896 (Italy); 8,920 (South Korea); 5,421 (US) and 2,895 (UK). This is the reason why it was possible for me to establish a mathematical exponential predictive equation for these countries, but not for India.
As such, until the actual testing data improves in India, it may be difficult to predict the infections for the next 15 to 30 days. The only alternative is to predict the number as reported in my previous article.
Methodology – Model Equation and Real World Infection Data
The model has been developed based on the Italian trend for infection spread of COVID-19 (Coronavirus 2019). In a previous article, I had established the equation for the exponential growth model for Italy. This equation was modified for other countries based on a scaling Model Factor (MF) and their population, via a Population Ratio (PR) defined as the “population of a country” divided by “population of Italy”. I’ve used the population data from https://worldpopulationreview.com. In a nutshell the equation takes the form:
Y(t) = MF * PR * 631.06 * exp (0.1852*t)
Y(t) is the “Cumulative Number of Infections” on day “t”. The model equation works in the exponential growth phase for the spread of the virus infection. The start of the exponential growth phase is taken when the cumulative number of infections has reached about 400, based on actual published test raw data, since it works well with the equation for many countries. When this happens, I’ve designated it as Day-1 of the exponential growth phase and t is set to 1 (t=1). For succeeding days, “t” is incremented by one.
It is my intent to update the above graph for India from 22 Mar 2020 to 22 Apr 2020. The graph above will be replaced on a daily basis. In the above graph t=1 is Day-1 which is set at 22 Mar 2020. Like wise t=2 and so on will increment from the set point date of 22 Mar 2020. For those who don’t like math, they just have to look at the graph and see the trends.
The data for India from 22 Mar 2020 to 05 Apr 2020 is spurious. It is far removed from the truth. It is because of insufficient testing in the country. This gives an artificially low number for inFections. The graph, with MF=0.04, is the Italian trend for inFections. The India data is 56% lower than Italy, for the same time period of exponential growth. But, the Indian population is 22.56 times that of the Italian population.
The model has close likeness for MF = 0.02, which means that it is following a curve trend that has values lower than that for Italy. Further, we should be expecting an MF = 1, since India should scale to Italy based on the Population Ratio (PR). So, an MF = 0.02 is unreliable and should not be used as a possible model.
The real time data for the exponential growth phase for infections had started on 22 Mar 2020. However, as seen from the graph, the real data does not fit the model, for any value of practical MF. As such, it is very likely that the Real Data published for India by JHU is not of practical use and also does not represent a good sample of the population.
A key conclusion that can be made as on 05 Apr 2020 is that the reported data for inFections in India is totally wrong, in its entirety.
In my previous article, I had established the equation for the exponential growth model for Italy. The population ratio (PR) is 22.56673 for India. Multiplying the exponential growth model with PR should take into account the population differences. I’ve further multiplied it with a Model Factor (MF), which is a number greater than 0, but more likely about 1 for India. This factor fine tunes the scaling and at some value it will have a good match to the real world infection data. When MF is 1/PR, the curve is an identical match to the exponential best fit for Italy.
As such, for India, the modified exponential equation is:
y(t) = MF * PR * 631.06 * exp (0.1852*t)
i.e.
y(t) = MF * 22.56673 * 631.06 * exp (0.1852*t)
To predict the Infection Cases in India, select an MF between 0 and 1, plug in a value of “t”, where “t” is the day count starting from 22 Mar 2020 (t=1) for India. The computed value of y(t) is the predicted Infection Cases on day “t”.