Thursday 28 May 2020

When 'dependent' expert reports might be more informative than independent ones

Whether it's Government ministers deciding if it is safe to end Covid-19 lockdown, journal editors deciding if a research paper is worthy of publication, or just consumers deciding which kettle is best value, we have to rely on evaluating information from multiple 'experts' who may or may not agree on their conclusion. In determining which conclusion is most probable we have to take account of not just which experts we trust most, but also the extent to which the experts may or may not have collaborated.  This problem is especially pertinent in intelligence analysis work, and was addressed as part of a recent project funded by IARPA (Intelligence Advanced Research Projects Activity)*. While it is always assumed intutively that 'independence' among experts is advantageous, it turns out - as shown in a paper by Pilditch et al (researchers at UCL, Queen Mary and Birkbeck) just accepted for publication in Cognition, that this is not always the case.

Consider the following scenario:

A plane has crashed, and you must determine whether it was sabotage. You await the crash site reports from two investigators, Bailey and Campbell. They have separately assessed the various pieces of wreckage before leaving to write up their conclusions. Both investigators are equally accurate in their conclusions, seldom making mistakes. Now consider two alternative cases:

i. Bailey provides a report in which she concludes the plane was sabotaged, but she has also seen Campbell’s report, in which Campbell likewise concluded that the plane was sabotaged.

ii. Bailey provides a report in which she concludes the plane was sabotaged, based on her assessment alone. Campbell then separately provides a report (based on his assessment alone), likewise concluding that the plane was sabotaged.

Here, i) is a case of corroborating reports with a directional dependence from Campbell to Bailey (i.e., Bailey has seen Campbell’s report, thus Bailey’s report may depend upon Campbell’s, but not vice-versa), and ii) is a case of corroborating reports coming from independent sources. Given the two reports in each case, it would be right to conclude that more support for the conclusion that the plane was sabotaged is provided in the independent case.  However, now consider the same scenario, with two slight alterations:

1. Bailey reports to you the plane was sabotaged, having seen Campbell’s report, but you do not know what Campbell concluded (case i), versus you only know Bailey’s independent conclusion of sabotage (case ii).

2. Bailey reports to you the plane was sabotaged, having seen Campbell’s report, but you know that Campbell concluded the opposite (case i), versus you only know that Bailey and Campbell have independently provided contradictory conclusions (case ii).

1) is an instance of partial information, and 2) an instance of contradicting information. In both these instances, it is less clear whether case i) or ii) provides more support for the sabotage hypothesis.

The paper demonstrates that for partial or contradicting information, the dependent case (i) is, in fact, superior (i.e., there is a dependency advantage, in that more evidential support is provided to the hypothesis when a report is the result of a structural dependency (i) than when independent (ii)) – at least given reasonable assumptions.

Full reference:
Pilditch, T., Hahn, U., Fenton, N. E., & Lagnado, D. A. (2020). "Dependencies in evidential reports: The case for informational advantages". Cognition, to appear.  Accepted version (pdf)

*The research was supported in part by The Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), under Contract [2017-16122000003]. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein. The research was also supported in part by the Leverhulme Trust under Grant RPG-2016-118 CAUSAL-DYNAMICS. The authors acknowledge and Agena Ltd for software support.


Tuesday 26 May 2020

Covid-19: Infection rates are higher, fatality rates lower than widely reported

A new study by Queen Mary researchers using a Bayesian Network (BN) analysis of Covid-19 data reveals higher infection prevalence rates and lower fatality rates than have been widely reported.

Lead author Prof Martin Neil says :
 "Widely reported statistics on Covid-19 across the globe fail to take account of both the uncertainty of the data and possible explanations for this uncertainty. This study uses a Bayesian Network (BN) model to estimate the Covid-19 infection prevalence rate (IPR) and infection fatality rate (IFR) for different countries and regions, where relevant data are available. This combines multiple sources of data in a single model."
The results show that Chelsea Mass. USA and Gangelt Germany have relatively higher infection prevalence rates (IPR) than Santa Clara USA, Kobe, Japan and England and Wales. In all cases the infection prevalence is significantly higher than what has been widely reported, with much higher community infection rates in all locations. For Santa Clara and Chelsea, both in the USA, the most likely IFR values are 0.3-0.4%. Kobe, Japan is very unusual in comparison with the others with values an order of magnitude less than the others at, 0.001%. The IFR for Spain is centred around 1%. England and Wales lie between Spain and the USA/German values with an IFR around 0.8%.

There remains some uncertainty around these estimates but an IFR greater than 1% looks remote for all regions/countries. Neil says:
 "We use a Bayesian technique called 'virtual evidence' to test the sensitivity of the IFR to two significant sources of uncertainty: survey quality and uncertainty about Covid-19 death counts. In response the adjusted estimates for IFR are most likely to be in the range 0.3%-0.5%."
The full paper :
Neil, M., Fenton, N., Osman, M., & McLachlan, S. (2020). "Bayesian Network Analysis of Covid-19 data reveals higher Infection Prevalence Rates and lower Fatality Rates than widely reported". MedRxiv, 2020.05.25.20112466. 

See also:

Friday 15 May 2020

Why most studies into COVID19 risk factors may be producing flawed conclusions - and how to fix the problem


In a new paper we extend the recent work by Griffith et al which highlights how ‘collider bias’ in studies of COVID19 undermines our understanding of the disease risk and severity. This is typically caused by the data being restricted to people who have undergone COVID19 testing, among whom healthcare workers are over-represented. For example, collider bias caused by smokers being under-represented in the dataset may (at least partly) explain recent empirical results that suggest smoking reduces the risk of COVID19.

The new paper makes more explicit use of graphical causal models to interpret observed data. We show that the Griffith et al smoking example can be clarified and improved using Bayesian network models with realistic data and assumptions. We show that there is an even more fundamental problem for risk factors like ‘stress’ which, unlike smoking, is more rather than less prevalent among healthcare workers; in this case, because of a combination of collider bias from the biased dataset and the fact that ‘healthcare worker’ is a confounding variable, it is likely that studies will wrongly conclude that stress reduces rather than increases the risk of COVID19.  Indeed, exactly this has been claimed for hypertension and the same data could even - bizarrely - find factors like 'being in close contact with COVID19 patients' reducing the risk of COVID19. To avoid such erroneous conclusions, any analysis of observational data must take account of the underlying causal structure including colliders and confounders. If analysts fail to do this explicitly then any conclusions they make about the effect of specific risk factors on COVID19 are likely to be flawed.

The paper is here:
Fenton, N E (2020)  "Why most studies into COVID19 risk factors may be producing flawed conclusions - and how to fix the problem"  (pdf is also available here)

 See also:

  Simpson's Paradox Example 1: Kidney stone


  Simpson's Paradox Example 2: Food and exercise


Friday 8 May 2020

Covid-19 risk for the black and minority ethnic community: why reports are misleading and create unjustified fear and anxiety

Widely reported stories like the above from the Guardian in May (stating that "blacks are more than four times more likely to die from Covid-19 than whites") and today's report from the BBC are extremely concerning.  However, a new report from our research group shows that the claims are misleading and may create an unjustified level of fear and anxiety among the black and minority ethnic (BAME) community.

In particular, the claims in the Guardian article come from the conclusion in a UK Office of National Statistics (ONS) report, which is misleading for three reasons:
  1. It appears to rely on old 2011 census data about the population proportions rather than on more recent estimates;
  2. It appears to be based on an ‘age standardized’ measure of risk that is very different from that used by the World Health Organisation (WHO); and
  3. It focuses on relative rather than absolute measures of risk.
These all lead to exagerrating the risk to the BAME community. Regarding the last point, it is important to note that leading statistician and risk expert Prof David Spiegelhalter has convincingly argued why it is better – when discussing risk – to use absolute, not relative, risk differences and to express these as expected frequencies. For example, with this approach, using the ONS data (which was based on the fatalities up to 10 April) and the 2020 population estimates, we can conclude: 
For every 100,000 black people under 65 we expect about 3 more to die of Covid-19 than for every 100,000 white people (5.4 compared to 2.5 respectively in total). Equivalently, a black person under 65 has a 0.0029% increased probability (about 1 in 35,000) of dying compared to a white person under 65.
Hence, we believe the ONS conclusions may be misleading from a risk assessment perspective and may serve as a poor guide to public policy.

Full report:
It is also interesting to note that a previous ONS report on Covid-19 deaths by religion (as opposed to ethnicity) was also misleading in its conclusions.

See also: