Hugh Osmond recently put out this interesting twitter thread:
Although the average age of death in the UK is around 82, if
a person reaches 82 then on average (i.e. without knowing any details of the
particular person) the person can expect to live about another 8 years. We know
the average age of a person dying from Covid is also 82, so it seems reasonable
to assume that a typical person dying from Covid loses 8 life years. But, as Hugh
points out, this reasoning is flawed. Hugh provides what is essentially an informal argument. In what follows we provide a formal explanation.
The question we are trying to answer is:
Knowing that an individual (let’s say it’s a man called Fred) has died from Covid at the age of 82, how much longer would Fred have lived if he had not got Covid?
This is a classic ‘counterfactual’ question. We want to know what would happen to Fred in an alternative world if something that happened in the real world (namely, getting Covid) was changed. And we want to take account of everything we can learn about Fred that will be unchanged in the counterfactual world from what happened in the real world.
Crucially, because we know that Fred died of Covid, we learn from the real world that (compared to the average 82 year-old) Fred is much more likely to also have had some critical pre-existing illness. And this knowledge must be retained in the counterfactual world.
As very well explained by Judea Pearl, answering counterfactual questions requires a causal model and an inference method that are beyond what can be achieved with traditional statistical methods. In what follows we use illustrative and simplified assumptions to explain this counterfactual reasoning.
First, we need a causal model of the ‘real world’. Such a basic causal model is shown in Figure 1.
Figure 1 Basic causal model (for 82 year-olds)
This tells us that whether a person suffers a Covid death depends not just on whether the person becomes infected, but also on many other factors including whether the person has a pre-existing critical illness. Similarly, a person’s life expectancy depends on the same factors (among others).
When we assign prior probabilities to the nodes ‘Covid infection’ and ‘Pre-existing critical illness’, and conditional probabilities to the nodes ‘Covid death’ and ‘life expectancy’, then the causal model becomes a Bayesian network for which we can perform Bayesian inference.
In what follows we use the following assumed prior and conditional probabilities:
- Covid infection: 1%
- Pre-existing critical illness: 10%
- Covid death given Covid infection is false = 0%
- Covid death given Covid infection and Pre-existing critical illness = 10%
- Covid death given Covid infection and no Pre-existing critical illness = 0.1%
- The full set of conditional probabilities for life expectancy is given in Table 1
Table 1 Conditional probability table for 'life expectancy'
Changing any of these probability assumptions (within reason) does not change the thrust of the argument (readers are free to try out the model and make their own changes using their favourite Bayesian network software**).
With these assumptions we get the prior marginal probabilities shown in Figure 2.
So, in the real world the median life expectancy of an average 82-year-old is 8 years.
But, if we know that an individual dies from Covid then, while obviously this means that ‘Covid infection’ must be true, through Bayesian inference we also get the revised probability for ‘pre-existing critical illness’ shown in Figures 3.
Figure 3 Revised probabilities are observing Covid death
We can now use this new information about the probability this individual has pre-existing critical illness in a counterfactual world where he is not infected with Covid. The counterfactual model required for this is shown in Figure 4.
Figure 4 Counterfactual world
The ‘Covid infection’ and ‘life expectancy’ nodes in the counterfactual world are simply copies of the same named nodes from the real world model. They have exactly the same prior and conditional probability tables respectively. When we enter ‘Covid infection’ is false in the counterfactual world, we get the revised probability distribution for life expectancy. This individual has a median of 2 years life expectancy, rather than the 8 years.
The inference performed in this counterfactual model is not possible using the original model alone.
29 Jan update: Here is a 5 minute video explanation on the above with an even simpler model (that can be downloaded by right clicking)