Computing new covid-19 cases after correction for daily variation in testing capacity

Here’s how to interpret the variation of new official daily infected, taking into account the large fluctuation in the daily number of tests from one day to the next and the progressive increase (trend, moving average) of the daily number of tests that has been up to today. Calculating the correlation between the variable X (number of daily tests on day n / daily number of tests on day n-1) and variable Y (number of daily infected persons on day n / number of daily infected persons on day n-1), I deduced that if the daily number of tests increases by y more (from one day to the next, therefore with the number of actual infected people practically equal) then the daily number of infected people detected (confirmed) increases by y ^ ½ times. That is, for example, if, at parity of new real infected (all and not only those detected) I perform twice the tests I detect the root of 2 times more. NOW WE EXTRACT THE FORMULA that will allow us to calculate the variation, expressed as the increase k of the new real daily infected (k times, where k is a real positive number) knowing the increase (y) of the daily number of tests and the increase ( x) of the daily number of infected persons, that is, official. Bear in mind that x and y are also positive real numbers, i.e. they can be less than 1 (decrease).
It is evident that k (y / k) ^ 1/2 = x, then (k ^ (1/2)) (y ^ (1/2)) = x, then (k ^ (1/2) = x / (y ^ (1/2), so we have that k = (x ^ 2) / y.
Let’s take an example: if the official daily infected increase by 0.5 times that is halved and the daily tests increase by 2 times, then THE ACTUAL NEW DAILY CASES are actually increased by (0.5 ^ 2) / 2 times, that is DECREASED by 8 times !!!

CFR and the positivity rate as indicators of under-reporting of Covid-19 cases

There is certainly a significant amount of underreporting of coronavirus cases. This is due to people with severe symptoms being more likely to be tested, and the presence of a substantial asymptomatic fraction (around 50% according to some estimates, but likely lower after accounting for false positive rate). Underreporting should show up as the official CFR (case fatality rate) higher than the true CFR (if everyone was tested). The rationale is that if only (or disproportionately) very sick people are tested, the CFR will be higher because very sick people are more likely to die than people with mild or no symptoms. Another indicator of underreporting is the positivity rate (% tests that come up positive). If all individuals of two populations are tested, the positive rates are true reflections of the % infected individuals. However, since authorities only test a sample of the population, the positive rate depends on the criteria that people must meet in order to be tested (i.e. number and type of symptoms, contacts with confirmed cases, etc.). Hence, if both CFR and positive rate are affected by testing criteria (how much testing is restricted only to people with severe symptoms), there should be a positive correlation between them. A positive correlation would provide evidence for high CFR being an indicator of underreporting.

So far, 361,060 tests have been carried out in Italy. This is not a small number compared to other countries, but not large enough considering the amoung of infected individuals. Large scale testing is badly needed, considering the high positivity rate.

Covid-19 test statistics (http://www.salute.gov.it/nuovocoronavirus).

The CFR goes from 2.8% in Sicily to over 14% in Lombardia. This is a big difference. There is a lot of variability in the positivity rate as well, ranging from 6% in Calabria to 40% in Lombardia.

There is a positive correlation between CFR and positivity rate across 21 regions (r= 0.76).

However, this correlation supports the hypothesis that there is severe underreporting of cases.

The CFR in Italy is 10% and if we assume a true CFR around 1%(data from Diamond princess), then in Italy there are 10 times more cases than the confirmed ones.

Age could be a factor mediating the difference in CFR. In Italy covid-19 infects the elderly more than young people, as shown by a population-wide study of Vo’s (95% response rate). I don’t have data broken down by age for the Diamond Princess, but a correction of the CFR should be done according to it. Even after accounting for different age profile of infected individuals in Italy and on the Diamond Princess, there is likely a CFR difference between the two populations. I believe this difference is a result of under-reporting.

At the world-wide level, the correlation between CFR is r= 0.47, but the two variables are skewed, due to the highly heterogeneous nature of the samples.

What is the true incidence of Coronavirus? An empirical model using evacuees from Wuhan

Many people think that the official estimates of total infections from nCov-2019 are lower than the true incidence because of failure to count mild infections, or even worse, for some this is due to a cover-up by the Chinese government trying to hide the real entitity of the epidemic. I came up with the idea of using infection counts from evacuees (people repatriated by other countries) to have the most reliable data.

Using estimates of pathogen prevalence following phylodynamic methods ( which measure how quickly observed genomes share ancestry to estimate the rate of exponential growth) Trevord Bedford estimates a median total incidence on 8 Feb of 55,800 total infections since start of epidemic with a 95% uncertainty interval of between 17,500 and 194,400 total infections. This approach estimates an infection-to-case reporting rate of between 18% and 100%.

MRC’s Neil Ferguson was estimating that official figures were underreporting by a factor of 10x the real number of total infections.

I adopted a rather straightforward approach to this, by using data on evacuees from Wuhan to other countries. These are groups of non Chinese citizens living in Wuhan that were repatriated to their home countries during the last few weeks following the coronavirus outbreak. Most countries placed these individuals in strict quarantine and monitored them for flu-like symptoms. The suspicious cases were tested for coronavirus.

An advantage of using infection data from evacuees is that all the individuals were screened for the virus, at least at a symptomatological level, and the individuals showing symptoms were tested in the lab for the virus. This will result in a complete report of infections, which includes also mild cases. Moreover, fears of a cover-up by China are dissipated because the tests are carried out in other countries.

The official number of infected individuals in Wuhan was obtained from the CSSE interactive map, on the day the evacuees were boarded on the flight back to their home countries. Dividing the number of infected individuals by the Wuhan population (11 millions) gives the incidence of the virus. This figure was compared to the incidence among the evacuees (N. infected/number of people evacuated) for each country. Dividing the incidence among the evacuees by the estimated incidence in Wuhan gives the extent of the reporting rate (or underestimation factor). The reporting rate is estimated at 17.3%, or an “underestimation factor” of 5.78x, lower than Neil Ferguson’s estimate but closer to the lower bound (18%) of Trevor Bedford’s prediction.

Today’s confirmed cases according to CSSE is 43,126. I estimate the true total infections to be 250,000 as of February 11th.

Table 1. Infections among Wuhan evacuees.

JapanItalyGermanyFranceUSAS.KoreaFrance 2USA 2TaiwanCanada
Infected5120121110
Total20654124179195700254600247174
DateJan 29Feb 3Feb 1Jan 31Jan 29Jan 31Feb 2Feb 5Feb 3Feb 7
Wuhan official infected693017730128701080069301080015480246601773030600
Wuhan total pop11100000111000001110000011100000111000001110000011100000111000001110000011100000
Wuhan official infected rate0.00062432432430.0015972972970.0011594594590.0009729729730.00062432432430.0009729729730.0013945945950.0022216216220.0015972972970.002756756757
Repatriates infected rate0.024271844660.018518518520.0161290322600.0051282051280.0028571428570.0039370078740.0016666666670.0040485829960
Underestimation coefficient38.8769806211.5936579613.9108203608.2140082142.9365079372.8230482820.75020275752.5346458690

The updated infections among Wuhan evacuees from different countries can be seen here.

A reply to the Big Four


My work, and the latest paper in particular, has recently attracted some criticism from a small group of geneticists who chose to disobey one of the most basic and universally accepted principles of scientific research:  to provide references to the work you are criticizing. Despite this deficiency, I was invited by friends and colleagues to write a reply to their veiled attack on my work. Instead of playing their game of dismissing ideas using slogans and vulgar language (e.g.“pseudoscience”, “racist”, “cottage-industry”) in order to appear “woke”, I decided to empirically test their claims using the most updated publicly available datasets and simple statistical techniques. After reading their post and this, the reader can decide who the real pseudoscientists are.

This sentence in particular caught my attention because I had given it careful consideration before their piece came out: 

 “The genetic variants that are most strongly associated with IQ in Europeans are no more population-specific than any other trait.To put it bluntly, the same genetic variants associated with purportedly higher IQ in Europeans are also present in Africans, and have not emerged, or been obviously selected for, in recent evolutionary history outside Africa.”

This contention vaguely implies that population differences can only be driven by mutations that are present in some populations but not in others. This black and white account is intentionally misleading because it contradicts the scientific consensus that differences in complex traits between populations are due to different frequencies of common alleles  (Berg and Coop, 2014). Only non-complex traits, those that are governed by a few genes of large effect, obey this rule: for example, skin pigmentation or lactose tolerance, for which Europeans have evolved specific mutations. However, the hereditarian hypothesis of racial differences does not rely on the assumption that skin pigmentation and lactose tolerance are genetically linked to IQ. These may or may not be inter-related, and a potential relationship would be mediated by population structure (i.e. ancestry). It’s curious how the authors continuously highlight population structure as a problem in GWAS studies but they forget about its role as a possible explanation for the link between IQ and skin pigmentation.

GWA studies pick genetic variants related to intelligence and education, not to skin pigmentation, and my studies show a relationship between these variants and geographic origin, not with their skin pigmentation. Skin pigmentation and IQ can co-vary across populations without being located close to one another (“linked”) on the chromosomes, simply as a result of external factors such as drift or common selective pressures (i.e. climate).

I proceed to empirically test their claim that there are no population (European) specific SNPs that increase cognitive abilities.

Population specific SNPs are assumed to have arisen recently in one part of the world without sufficient time to spread across the globe. Alternatively, a population may have lost the minor allele of a polymorphism during (pre)historic population bottlenecks. Either way, the minor allele frequency (MAF) is expected to be low in those populations in which the polymorphism is present. We can assume that SNPs with MAF (minor allele frequency) < 0.01 among West Africans are enriched with alleles that originated after the out-of Africa migrations. I chose this threshold to account for some back-flow from Eurasia into West Africa, although this is a liberal estimate because the real amount of back-flow was likely lower. Since the real extent of the backflow was likely lower, not all these SNPs are recent. However, the SNPs that we put into the “ancient” group will be less likely to be false positives – that is, recent SNPs that are misassigned as ancient. Update: I run this analysis also using a more stringent threshold, that is only the SNPs that are absent (MAF=0) from West Africans were classified as recent (See Appendix). Results were practically the same as for the main analysis reported below.

I used Lee et al’s GWAS (2019) summary data on educational attainment and cognition (EA-MTAG).

There were four main findings:

  1. In the 1000 Genomes dataset, most of the SNPs are found at MAF>0.01 among Yorubans (West Africans from Nigeria), and these SNPs are almost certainly old (6734 old vs 2267 young SNPs). 

This suggests that most of the genetic variants associated (in European-based GWAS) with higher IQ and education are also present among Africans. However, a substantial proportion (25%) of them are likely recent, that is they emerged in recent evolutionary history outside of Africa. . The proportion slightly decreased (15.5%) with the SNPs that are completely absent from West Africans, but still very far from Birney et al.’s claim that there are no education-increasing alleles that are specific to Europeans.

  1. The average Beta coefficients for the two categories were not significantly different (t= 0.722. p=0.469). Average Beta old: -7.3*10^-5; Average Beta young: 0.00017.

Recent SNPs tend to have slightly more positive effect but this trend is not significant. This is the case also when SNPs that are entirely absent from Africans are used.

  1. I also computed the number of alleles with positive and negative effect in the two groups. The proportions are not significantly different (table 1), as shown via Fisher’s exact test (Odds ratio: 0.978, 95% CI=0.8882281-1.0769980). 

In other words, the recent SNPs are not enriched with alleles whose direction of effect is positive.

Table 1. SNPs by “age” and effect on Education


PositiveNegative
Ancient33463388
Recent11391128
  1. I also computed PGS for old and young SNPs for YRI and CEU for the EA_MTAG GWAS significant SNPs (P<5*10^-8). The two-way interaction between population and age is not significant as shown by ANOVA (table 2).

Table 2. ANOVA table (Response: PGS)


Sum SqDFF valueP
Population0.961010.150.001
Age2.232323.451,32*10^-6
Population:Age0.0110.090.769

When they are computed using ancient or recent SNPs, polygenic scores exhibit a very similar difference between West Africans (Yorubans) and Europans (White Americans). The lack of a significant interaction can be seen in figure 1.

Figure 1 .Polygenic scores by population and SNP age.

In summary, we can refute the authors’ claim that there are no IQ/EDU-increasing alleles that are unique to non Africans. In fact, 25% of the SNPs probably originated after the the out of Africa exodus. However, the presence of population-specific alleles is not required to make some populations smarter than others. Selection can act on standing variation – that is, alleles that existed in a population before an environmental change causing selection pressure took place – , and produce allele frequency shifts at ancient SNPs that are shared across continental groups (Lee & Coop, 2017). We don’t know if Europeans have more alleles unique to them that increase intelligence, because we won’t know how many intelligence-increasing alleles are unique to Africans until GWAS of education and intelligence will be carried out on them. In fact, we cannot assume that the alleles that are found by European GWAS and are absent from Africans are causing them to have a higher IQ, because Africans could have as many yet to be found alleles that increase the IQ of African individuals.

However, what we do know is that even among the SNPs that are common to both Africans and Europeans and are putatively ancient, there is a polygenic score difference of comparable (i.e. not significantly different) magnitude, and that Europeans have a significantly higher polygenic score (table 2 and figure 1). An important research question is how many of these uniquely European mutations arose in the Sapiens Sapiens lineage or were introgressed from Neanderthals.

To be sure, my work does not focus solely on the White-Black difference in intelligence, but it uses populations from all over the world. In figure 2, I show polygenic scores for the GWAS significant SNPs of Education and cognitive abilities (EA-MTAG):


Figure 2.

The authors then use the Flynn Effect as their nail in the coffin for the hereditarian hypothesis. Their contention is that since IQ increased in European countries during the 20th century, and this increase cannot be due to genetic changes, the differences in IQ between countries can be explained by environmental factors. Specifically, if Africa were to acquire Western standards of living, its IQ scores would increase to contemporary European levels.

I have a few objections: 1) No one is disputing that environmental conditions can affect average population IQ. I am certain that better nutrition and health-care will increase the IQ of developing countries. However, this increase will not necessarily be large enough to completely fill the current gap. In my last paper (Piffer, 2019), I estimated the effect of environment on IQ, and found that some developing countries score lower (negative residuals) than would be predicted by their polygenic score. I found that environmental factors (e.g. child mortality, Human Development Index, total protein intake), explain about 35% of the variance in IQ between countries, and the other 65% is explained by the polygenic score. Conversely, when I run the same model on height, the environmental factors explained the lion’s share of height differences between countries (70-80%), and polygenic scores explained the rest. Since there has been a dramatic increase in the average height of people living in Western countries during the 20th century, comparable to the Flynn Effect in its magnitude, shall we conclude that all human populations have the same genetic potential to reach similar final heights? Shall we believe that northern Europeans are not genetically taller than the Vietnamese? Applying this fallacious logic, we reach absurd conclusions.

Moreover, this would contradict evidence from recent studies (Chen et al., 2019) that found signatures of selection on height among European populations, after accounting for population stratification. Since the importance of environmental variables across countries for IQ is the same or less than that on height, it is likely that genetics plays a role in differences between countries in cognitive abilities as well.

In fact, the presence of an environmental effect does not automatically rule out genetics as an explanation.

The authors also argue that predictions from a population cannot be extended to genetically distant ones. This is partly true, but there are already methods to overcome this problem. Many trans-ethnic GWAS to date have found that a large share of SNPs have a direction of effect that is consistent across populations (Akiyama et al., 2019)., and by using SNPs with consistent effect direction, phenotypic trans-ethnic prediction can be achieved (Chen et al., 2019). Similarly, methods to detect polygenic selection can leverage on this information and they provide another way to control for population stratification.

Appendix

Here I report the analysis done using a more stringent  criterion for population specific alleles, that is they have to be completely absent from West Africans (MAF=0). The results are practically the same as using the more liberal threshold. GWAS significant SNPs (N=3527) were used.

Average frequency Beta old: -0.0001250309; Beta young= 0.0013089328

T- test: Beta young vs old: t = -1.6955, df = 576.96, p-value = 0.09053

Table 1b. SNPs by “age” and effect on Education


PositiveNegative
Ancient13631368
Recent265241

Fisher’s exact test (95% C.I. O.R.= 0.735 – 1.084)

Table 2b. ANOVA table (Response: PGS)


Sum SqDFF valueP
Population0.4815.5790.0182
Age3.18136.8561.343*10^-9
Population:Age010.0320.856

Fig. 1b. Polygenic scores by population and SNP age.

References:

Akiyama, M., Ishigaki, K., Sakaue, S. et al. Characterizing rare and low-frequency height-associated variants in the Japanese population. Nat Commun 10, 4393 (2019) doi:10.1038/s41467-019-12276-5

Berg, J.J.; Coop, G. A population genetic signal of polygenic adaptation. PLoS Genet. 2014, 10, e1004412. 

Minhui Chen, Carlo Sidore, Masato Akiyama, Kazuyoshi Ishigaki, Yoichiro Kamatani, David Schlessinger, Francesco Cucca, Yukinori Okada, Charleston W. K. Chiang. Evidence of polygenic adaptation at height-associated loci in mainland Europeans and Sardinians. bioRxiv 776377; doi: https://doi.org/10.1101/776377

Piffer, D. (2019). Evidence for Recent Polygenic Selection on Educational Attainment and Intelligence Inferred from Gwas Hits: A Replication of Previous Findings Using Recent Data. Psych, 1(1), 55–75. doi:10.3390/psych1010005

Lee, J.J.; Wedow, R.; Okbay, A.; Kong, E.; Maghzian, O.; Zacher, M.; Nguyen-Viet, T.A.; Bowers, P.; Sidorenko, J.; Karlsson Linnér, R.; et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 2018, 50, 1112–1121. 

Lee, K.M., & Coop, G. (2017). Distinguishing Among Modes of Convergent Adaptation Using Population Genomic Data. GENETICS, 207, 1591-1619; https://doi.org/10.1534/genetics.117.300417

Design a site like this with WordPress.com
Get started