Wednesday, 8 July 2020

UK Covid19 death rates by religion: Jews by far the highest and atheists by far the lowest 'overall' - but what does it mean?

The most recent UK Office of National Statistics (ONS) report on Covid19 deaths by religion (covering the period 2 March - 15 May)  provides the overall number of fatalities for each religious group but, curiously, provides no simple overall fatality rate by religious group. So, I have done it myself.

Using Table 2 of the report (which provides the total deaths per religious group) and Table 1 of the report (which provides the population proportion per religion) and assuming the UK population size is 65 million, we get the following table of deaths per 100,000 by religion:


So, looking only at these population totals, Jews (by far) and then Christians have the highest death rate with atheists (no religion)* by far the lowest.

Now, while there are many Black Christians who come under the “BAME” (Black And Minority Ethnic) classification, there are very few Black Jews in the UK. So the results here seem to contradict the widely accepted narrative about BAME being ‘by far’ the highest risk group.

The question is whether an obvious confounding factor like age is causing a Simpson's paradox effect here whereby - although the overall rate is highest for a particular class of people - it may be possible that a different class is highest in each age sub-category. For example, as Dana Mackenzie shows for US statistics:

 although in every age category (except ages 0-4), whites have a lower case fatality rate than non-whites, when we aggregate all of the ages, whites have a higher fatality rate. The reason is simple: whites are older.

So, is that what we have here also, i.e. is it all explained by the fact that Jews and Christians are older?

Well, according to the statistical analysis in the ONS report it may be to a certain extent.  The report uses ‘age standardized mortality rates’ to take account of the age distribution differences and concludes that Muslims, rather than Jews, have the highest fatality risk (something which seems very surprising given the above table).

However, the report does not define how the ‘age standardized mortality rates’ are calculated and it does not provide the raw data to check the results either (just as this Barts study failed to provide the necessary raw data to check if its bold claims about higher risk for BAME people were valid). Another concerning aspect of the report is that a lot of it focuses on the under 65s. Yet, the the total number of fatalities in the under 65s is dwarfed by the number of fatalities in the over 65s.

Our approach** to this problem is to construct causal (probabilistic) models such as the one below (this is, of course, also the approach recommended by Pearl and Mackenzie in their excellent "Book of Why").

The kind of causal model required to fully understand impact of religion and ethnicity on Covid19 death risk (dotted nodes represent variables that cannot be directly observed)


Note that there are many factors other than just age that must be incorporated into any analysis of the observed data before making definitive conclusions about risk based on religion/ethnicity. Moreover, if we discount unknown genetic factors, then religion and ethnicity have NO impact at all on a person's Covid19 death risk once we know their age, underlying medical conditions, work/living conditions, and extent of social distancing.

Thanks to Georgina Prodhan for alerting us to the ONS report.


*It is fair to assume these are atheists because these are people who declared "no religion" as opposed to those who did not declare any religion (i.e. those who fall into the category "not stated or required")


References:




Friday, 12 June 2020

Bayesian networks in healthcare



A paper published today in the Journal Artificial Intelligence in Medicine  provides a comprehensive classification of the health conditions for which Bayesian networks have been used (the paper is part of a much bigger scoping review).

Four conditions - cardiac, cancer, psychological and lung disorders - make up over two-thirds of the research. The paper identifies differences in the approaches used by authors between each of these four primary health conditions and contributes to our understanding of how and what Bayesian networks are being considered for in healthcare.

Full details and download:


Full details and download of the full scoping review preprint:

Thursday, 28 May 2020

When 'dependent' expert reports might be more informative than independent ones


Whether it's Government ministers deciding if it is safe to end Covid-19 lockdown, journal editors deciding if a research paper is worthy of publication, or just consumers deciding which kettle is best value, we have to rely on evaluating information from multiple 'experts' who may or may not agree on their conclusion. In determining which conclusion is most probable we have to take account of not just which experts we trust most, but also the extent to which the experts may or may not have collaborated.  This problem is especially pertinent in intelligence analysis work, and was addressed as part of a recent project funded by IARPA (Intelligence Advanced Research Projects Activity)*. While it is always assumed intutively that 'independence' among experts is advantageous, it turns out - as shown in a paper by Pilditch et al (researchers at UCL, Queen Mary and Birkbeck) just accepted for publication in Cognition, that this is not always the case.

Consider the following scenario:

A plane has crashed, and you must determine whether it was sabotage. You await the crash site reports from two investigators, Bailey and Campbell. They have separately assessed the various pieces of wreckage before leaving to write up their conclusions. Both investigators are equally accurate in their conclusions, seldom making mistakes. Now consider two alternative cases:

i. Bailey provides a report in which she concludes the plane was sabotaged, but she has also seen Campbell’s report, in which Campbell likewise concluded that the plane was sabotaged.

ii. Bailey provides a report in which she concludes the plane was sabotaged, based on her assessment alone. Campbell then separately provides a report (based on his assessment alone), likewise concluding that the plane was sabotaged.

Here, i) is a case of corroborating reports with a directional dependence from Campbell to Bailey (i.e., Bailey has seen Campbell’s report, thus Bailey’s report may depend upon Campbell’s, but not vice-versa), and ii) is a case of corroborating reports coming from independent sources. Given the two reports in each case, it would be right to conclude that more support for the conclusion that the plane was sabotaged is provided in the independent case.  However, now consider the same scenario, with two slight alterations:

1. Bailey reports to you the plane was sabotaged, having seen Campbell’s report, but you do not know what Campbell concluded (case i), versus you only know Bailey’s independent conclusion of sabotage (case ii).

2. Bailey reports to you the plane was sabotaged, having seen Campbell’s report, but you know that Campbell concluded the opposite (case i), versus you only know that Bailey and Campbell have independently provided contradictory conclusions (case ii).

1) is an instance of partial information, and 2) an instance of contradicting information. In both these instances, it is less clear whether case i) or ii) provides more support for the sabotage hypothesis.

The paper demonstrates that for partial or contradicting information, the dependent case (i) is, in fact, superior (i.e., there is a dependency advantage, in that more evidential support is provided to the hypothesis when a report is the result of a structural dependency (i) than when independent (ii)) – at least given reasonable assumptions.

Full reference:
Pilditch, T., Hahn, U., Fenton, N. E., & Lagnado, D. A. (2020). "Dependencies in evidential reports: The case for informational advantages". Cognition, to appear.  Accepted version (pdf)


*The research was supported in part by The Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), under Contract [2017-16122000003]. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein. The research was also supported in part by the Leverhulme Trust under Grant RPG-2016-118 CAUSAL-DYNAMICS. The authors acknowledge and Agena Ltd for software support.

 

Tuesday, 26 May 2020

Covid-19: Infection rates are higher, fatality rates lower than widely reported



A new study by Queen Mary researchers using a Bayesian Network (BN) analysis of Covid-19 data reveals higher infection prevalence rates and lower fatality rates than have been widely reported.

Lead author Prof Martin Neil says :
 "Widely reported statistics on Covid-19 across the globe fail to take account of both the uncertainty of the data and possible explanations for this uncertainty. This study uses a Bayesian Network (BN) model to estimate the Covid-19 infection prevalence rate (IPR) and infection fatality rate (IFR) for different countries and regions, where relevant data are available. This combines multiple sources of data in a single model."
The results show that Chelsea Mass. USA and Gangelt Germany have relatively higher infection prevalence rates (IPR) than Santa Clara USA, Kobe, Japan and England and Wales. In all cases the infection prevalence is significantly higher than what has been widely reported, with much higher community infection rates in all locations. For Santa Clara and Chelsea, both in the USA, the most likely IFR values are 0.3-0.4%. Kobe, Japan is very unusual in comparison with the others with values an order of magnitude less than the others at, 0.001%. The IFR for Spain is centred around 1%. England and Wales lie between Spain and the USA/German values with an IFR around 0.8%.

There remains some uncertainty around these estimates but an IFR greater than 1% looks remote for all regions/countries. Neil says:
 "We use a Bayesian technique called 'virtual evidence' to test the sensitivity of the IFR to two significant sources of uncertainty: survey quality and uncertainty about Covid-19 death counts. In response the adjusted estimates for IFR are most likely to be in the range 0.3%-0.5%."
The full paper :
Neil, M., Fenton, N., Osman, M., & McLachlan, S. (2020). "Bayesian Network Analysis of Covid-19 data reveals higher Infection Prevalence Rates and lower Fatality Rates than widely reported". MedRxiv, 2020.05.25.20112466. https://doi.org/10.1101/2020.05.25.20112466 

See also:


Friday, 15 May 2020

Why most studies into COVID19 risk factors may be producing flawed conclusions - and how to fix the problem

 

In a new paper we extend the recent work by Griffith et al which highlights how ‘collider bias’ in studies of COVID19 undermines our understanding of the disease risk and severity. This is typically caused by the data being restricted to people who have undergone COVID19 testing, among whom healthcare workers are over-represented. For example, collider bias caused by smokers being under-represented in the dataset may (at least partly) explain recent empirical results that suggest smoking reduces the risk of COVID19.

The new paper makes more explicit use of graphical causal models to interpret observed data. We show that the Griffith et al smoking example can be clarified and improved using Bayesian network models with realistic data and assumptions. We show that there is an even more fundamental problem for risk factors like ‘stress’ which, unlike smoking, is more rather than less prevalent among healthcare workers; in this case, because of a combination of collider bias from the biased dataset and the fact that ‘healthcare worker’ is a confounding variable, it is likely that studies will wrongly conclude that stress reduces rather than increases the risk of COVID19.  Indeed, exactly this has been claimed for hypertension and the same data could even - bizarrely - find factors like 'being in close contact with COVID19 patients' reducing the risk of COVID19. To avoid such erroneous conclusions, any analysis of observational data must take account of the underlying causal structure including colliders and confounders. If analysts fail to do this explicitly then any conclusions they make about the effect of specific risk factors on COVID19 are likely to be flawed.

The paper is here:
Fenton, N E (2020)  "Why most studies into COVID19 risk factors may be producing flawed conclusions - and how to fix the problem" arxiv.org/abs/2005.08608  (pdf is also available here)

 See also:

  Simpson's Paradox Example 1: Kidney stone

 

  Simpson's Paradox Example 2: Food and exercise

 

Sunday, 26 April 2020

The Deer Hunter: A lesson in the basics of risk and probability assessment


I was recently watching a re-run of the classic 1978 Michael Cimino film “The Deer Hunter”. It contains one of the most iconic scenes in cinema history involving a ‘game’ of Russian roulette forcibly played by two American soldiers held captive in Vietnam. Although I have seen the film several times, this scene never seems to lose its impact. As I am currently teaching a new course on Risk Assessment and Decision Making, it also occurred to me that the scene provides a rich source of examples to illustrate core concepts of probability and risk including: probability and odds, basic probability axioms, conditional probability, risk and utility, absolute versus relative risk, event trees, and Bayesian networks.

So, I have written a short paper which hopefully has something of value both for people with no background in probability/statistics and also people who do, but want to find out more:
Fenton, N. E. (2020). The Deer Hunter: A lesson in the basics of risk and probability assessment. https://doi.org/10.13140/RG.2.2.31675.98089.
(The Bayesian network models described in the appendix of the paper are in this file which can be run using the trial version of AgenaRisk

 I have also made a video based on the paper, which includes the actual scene from the film with my narrative:






Monday, 13 April 2020

Basic training with a Bayesian network tool helps lay people solve complex problems

Researchers at UCL and Birkbeck have published an important study on the benefits of using a Bayesian Network (BN) tool to solve the kinds of complex problems that intelligence analysts are confronted with.

Example of the type of problem considered. Participants had to answer questions such as which group was most likely responsible for the attack based on various details about multiple informant sources and their accuracy

The work was part of the IARPA funded BARD (Bayesian ARgumentation via Delphi) project which developed a BN tool tailored for intelligence analysts*

The study provides strong empirical evidence that if you provide basic training to use the BARD tool for constructing BNs then this improves the ability of individuals to solve complex probabilistic reasoning problems, compared to a control group receiving only generic training in probabilistic reasoning. 

The full details of the paper (which includes a link to all of the problems and data) are:


Cruz, N., Desai, S. C., Dewitt, S., Hahn, U., Lagnado, D., Liefgreen, A., Phillips, K., Pilditch, T., and  Tešić, M. (2020). "Widening Access to Bayesian Problem Solving". Frontiers in Psychology, 11, 660. https://doi.org/10.3389/fpsyg.2020.00660

*I declare an interest here: the BARD tool was developed from the AgenaRisk API.
** Again I declare an interest: I was involved with some of the training