Sunday, 15 March 2020

Simpson's paradox again: fixing an example from Pearl's "Book of Why" (with video)

I've written about Simpson's paradox before. Given its importance in highlighting the need for causal explanations of observed data, I've been using it to motivate students on my new module on risk assessment and decision analysis for data science.  I have put together a couple of videos with examples to explain it graphically (see below). I wanted to base one of the videos on the example of 'exercise v cholesterol' presented in the excellent "Book of Why" by Pearl and Mackenzie:



But it turns out there is a problem with this example. It assumes that in the 'data' in the real world (the left hand figure), older people are the ones who do most exercise. This is clearly not the case. At first I thought this was due to a simple ‘typo’ in that they labelled the age groups the wrong way round (i.e. the 10 – 20 – 30 – 40 – 50 age groups should be reversed). But if you reverse them you hit a different error – this time it would show that older people have lower cholesterol than young people, which is again clearly wrong. So whichever way you spin this, the example simply does not make sense in the ‘real world’ because it does not make sense for the chosen attributes.

However, the example can be 'fixed' by considering instead 'exercise v junk food consumption' because - in the real world - it is the case that older people not only exercise less than younger people but they also eat less junk food. (**22 March 2020 UPDATE




I have prepared a (6-minute) video using this example:


And here is another video (5-minutes) explaining a more common example of Simpson's paradox:


**22 March 2020 update: It seems that in some age categories there might be a problem also with my assumption. My colleague Marko Tesic points out:
I’m just wondering about the relationship between exercise and junk food intake within each age group. I’m not sure the association is negative for each age group (although I do think that when considering the whole population this association is positive). People who exercise often eat quite a lot: the more people are active the more fuel their body needs to recover, in particular if people want to gain muscle weight (which is often the case with young people). Now, it’s not unlikely that a bunch of the food that people who excise eat is actually junk food. The attached paper suggests exactly that. Namely, they find that many people, in particular young people, indulge in junk food after exercise. So I think that the association between exercise and junk food intake may not be negative within each age group. Rather, it’s perhaps positive for teenagers and young adults, close to no association for mature adults and negative for pensioners. This is still interesting as it’d be showing that the general population association does not hold in all age groups and that there’s a partial (rather than complete) reversal in the association.
Simone Dohle, Brian Wansink, and Lorena Zehnder (2014). Exercise and Food Compensation: Exploring Diet-related Beliefs and Behaviors of Regular Exercisers.Journal of Physical Activity and Health doi:http://dx.doi.org/10.1123/jpah.2013-0383

See also:



2 comments:

  1. Probability And Risk: Simpson'S Paradox Again: Fixing An Example From Pearl'S "Book Of Why" (With Video) >>>>> Download Now

    >>>>> Download Full

    Probability And Risk: Simpson'S Paradox Again: Fixing An Example From Pearl'S "Book Of Why" (With Video) >>>>> Download LINK

    >>>>> Download Now

    Probability And Risk: Simpson'S Paradox Again: Fixing An Example From Pearl'S "Book Of Why" (With Video) >>>>> Download Full

    >>>>> Download LINK WS

    ReplyDelete
  2. Excellent point! Thank you! Is the junk food / exercise data real? Or a mock up? Is it possible to download the data?

    ReplyDelete