Sunday, 29 March 2020

COVID-19: the need for more random testing combined with causal modelling

We know some strawberry flavoured sweets are contaminated. But, if wrapper colour is not a reliable indicator of the flavour of sweet it contains, what do we learn about the proportions of strawberry and contaminated sweets if we only test sweets with red wrappers?

The current COVID-19 strategic testing strategies - implemented to inform policy making - focus primarily on people already hospitalized with significant symptoms or on people most at risk. This seems to make sense for short-term medical reasons, but such testing is highly biased with sub-optimal consequences. Without understanding the causal explanations for the resulting data from such testing we end up with highly misleading conclusions about infection and death rates. Starting with an analogy of testing sweets for contamination, this short paper illuminates the need for random testing combined with causal models:
Fenton N E, Osman M, Neil M, McLachlan S, "Improving the statistics and analysis of coronavirus by avoiding bias in testing and incorporating causal explanations for the data"
2 April 2020 UPDATE: A revised and edited version of this article now appears as the lead story on The Conversation

Sunday, 15 March 2020

Simpson's paradox again: fixing an example from Pearl's "Book of Why" (with video)

I've written about Simpson's paradox before. Given its importance in highlighting the need for causal explanations of observed data, I've been using it to motivate students on my new module on risk assessment and decision analysis for data science.  I have put together a couple of videos with examples to explain it graphically (see below). I wanted to base one of the videos on the example of 'exercise v cholesterol' presented in the excellent "Book of Why" by Pearl and Mackenzie:



But it turns out there is a problem with this example. It assumes that in the 'data' in the real world (the left hand figure), older people are the ones who do most exercise. This is clearly not the case. At first I thought this was due to a simple ‘typo’ in that they labelled the age groups the wrong way round (i.e. the 10 – 20 – 30 – 40 – 50 age groups should be reversed). But if you reverse them you hit a different error – this time it would show that older people have lower cholesterol than young people, which is again clearly wrong. So whichever way you spin this, the example simply does not make sense in the ‘real world’ because it does not make sense for the chosen attributes.

However, the example can be 'fixed' by considering instead 'exercise v junk food consumption' because - in the real world - it is the case that older people not only exercise less than younger people but they also eat less junk food. (**22 March 2020 UPDATE




I have prepared a (6-minute) video using this example:


And here is another video (5-minutes) explaining a more common example of Simpson's paradox:


**22 March 2020 update: It seems that in some age categories there might be a problem also with my assumption. My colleague Marko Tesic points out:
I’m just wondering about the relationship between exercise and junk food intake within each age group. I’m not sure the association is negative for each age group (although I do think that when considering the whole population this association is positive). People who exercise often eat quite a lot: the more people are active the more fuel their body needs to recover, in particular if people want to gain muscle weight (which is often the case with young people). Now, it’s not unlikely that a bunch of the food that people who excise eat is actually junk food. The attached paper suggests exactly that. Namely, they find that many people, in particular young people, indulge in junk food after exercise. So I think that the association between exercise and junk food intake may not be negative within each age group. Rather, it’s perhaps positive for teenagers and young adults, close to no association for mature adults and negative for pensioners. This is still interesting as it’d be showing that the general population association does not hold in all age groups and that there’s a partial (rather than complete) reversal in the association.
Simone Dohle, Brian Wansink, and Lorena Zehnder (2014). Exercise and Food Compensation: Exploring Diet-related Beliefs and Behaviors of Regular Exercisers.Journal of Physical Activity and Health doi:http://dx.doi.org/10.1123/jpah.2013-0383

See also:



Friday, 13 March 2020

In the UK football was always going to be the tipping point for Coronavirus risk mitigation


Yesterday the PM announced the importance of not cancelling major sporting events; and the Premier League announced there would certainly be no cancellations of this weekend's matches. But anybody with any football knowledge knew that - whatever the Government's risk mitigation plans were for Coronovirus - they would become irrelevant if a single high profile Premiership player or manager became infected. 
As soon as it was confirmed that Arsenal manager Mikel Arteta had the virus last night, it was inevitable that a total shutdown of all professional football would start and that is precisely what has happened. Again, anybody with any UK football knowledge knows that this is also a game-changer as far as the whole UK economy is concerned. Millions who were previously unmoved to make any changes will voluntarily go into lock-down after panic buying (see the above immediate response). 
Note that nothing much changed when it was announced that the Government's own Health Minister got the virus a few days ago. But one key football person getting it was the single trigger for mass change.

I can only assume that the Government risk experts/advisors did not include a single person with football knowledge.....

p.s. from a purely selfish perspective, as a Spurs fan, I am delighted that - by the time the Premiership resumes - we might have some of these players fit again..