Friday 25 January 2019

Magda Osman: fighting mainstream opinions on 'nudge' techniques and communicating uncertainty

Dr Magda Osman is our colleague at Queen Mary University of London who is a world leading expert on experimental psychology and especially the psychology of agency and control. She has questioned the effectiveness of 'nudge' persuasion techniques for improving individual and societal well-being. She is a co-PI in our project CAUSAL-DYNAMICS which is concerned with modelling dynamic decision-making from a causal perspective. We recently published a joint paper (which got a lot of publicity) describing our experiments testing how far people go in trusting experts. As Magda is currently also part-seconded to the Food Standards Agency she is often sought out for her views on the use of nudges in several food related policy issues. Two recent experiences suggest the scale of the difficulties in getting her message across. Last week she appeared near the end of the Channel 4 Programme 'How to lose weight well' (the full programme is here - Magda appears at 43 minutes)

That 30-second clip above (which might be blocked by Channel 4) was all that remained of an interview that lasted an hour. Magda says:
I was specifically invited on the program – the brief was that there was a very speculative technique now available via the NHS, that is claimed to use Nudge methods to help lose weight (because, the rationale is that it is designed to target people's unconscious processes subliminally – this is grossly inaccurate for reasons, first, because subliminal means below the threshold of conscious attention, but people’s attention is consciously directed to the messages being played to them via this NHS weight loss technique, and second, nudge has nothing to do with targeting the unconscious subliminally).

They were aware that I have critical views on nudge and on work to do with the unconscious and so they wanted an expert to discuss the issues and why there might be some reason to doubt the findings and the claims made by the NHS hypnotherapeutic technique which proposes that people don’t need to use any willpower to lose weight, their unconscious will do all the work because the technique will rewire their conscious thoughts.

I spent an hour being interviewed, and several of the questions concerned topics such as, ‘why is it that people say they have lost weight using this technique?’, and to speculate why it might be that people on the programme that would trial the technique might also lose weight, even if I’m suggesting that the method itself is unlikely to be effective because the evidence for it working is weak, and the theoretical basis for it is flawed? My answers to these questions were that there are statistical reasons for why it is that some people will show that they have lost weight as a result of the technique, but that has nothing to do with the technique itself. It is more to do with understanding random fluctuations in behaviour in samples that are tested. Also, the psychological factor is that, once people tell other people that they want to lose weight, and that they are going on a programme on national television (where they are filmed before and after the method), this places a high incentive on them to try to lose weight. Actually losing weight then may have nothing to do with the technique itself, but more to do with the willingness, motivation and commitment people will put in to do the mundane things that are absolutely necessary to lose weight, which is eat less fatty food, eat more healthily, and exercise more.
So, I spent an hour discussing these things, giving very clear and cogent reasons and examples (which they had specifically asked for) to demonstrate why it is that the method they were asking me to talk about is problematic, and should be considered with a huge degree of scepticism.
But what happened on the show was a set up. They filmed people motivated to take part in the trial of the method, they did not present many details about the method, or the patchy and problematic evidence base for it, then they bring me on and edit my interview so that I am shown to pooh-pooh the technique, and then they bring on people as testimonials of the technique’s success, and point out that 'the expert has got it wrong'
Obviously I didn’t get it wrong, because it was what I had predicted, but the piece was edited in a way to show that the value of one or two people’s experience is of equal or more weight than the value of 15 years worth of study in a field of work, which entails summarising thousands of data points.
This does nothing for helping people understand core issues to do with sampling, statistical inference, the value of a good causal understandings of evidence, the value of expertise, and the need for scepticism.

On top of that experience Magda and more of my colleagues were invited to submit a workshop to the International Conference on Uncertainty in Risk Analysis 2019, sponsored by the European Food Safety Authority (EFSA) and the German Federal Institute for Risk Assessment (BfR). Their workshop was one of five invited. Yet, the message of their workshop was not exactly what the EFSA wanted to hear about their safety standards. Consequently, Magda was told that, unlike the other four workshops, theirs would be relegated to a tiny side room that could only accommodate the workshop speakers - i.e.  it was essentially no longer to be part of the programme. I'm not sure if it was deliberately to rub salt into wounds, but this is how Magda's workshop is currently advertised on the conference website...

Thursday 17 January 2019

Manhunt: the Levi Bellfield case from a probabilistic perspective

The ITV 3-part Series "Manhunt" starring Martin Clunes tells the story of the search for the killer of Amelie Delagrange who was murdered in Twickenham in 2004. It is based on the book by Colin Sutton (who was the detective in charge of the case) and dramatises his fight to find Amelie's killer, Levi Bellfield, who was also charged with the murder of Marsha McDonnell and three other attempted murders. The trial of Bellfield for these five crimes took place at the Old Bailey in 2008. He was convicted of the two murders and one of the three attempted murders (he was also later charged and convicted of the murder of Milly Dowler).

I declare a particular interest here because, between 2007-8, I (along with colleague Martin Neil) acted as an expert consultant to the Defence team in the case against Bellfield for the five crimes. We were initially asked to provide a statistical analysis relating to the number of car number plates that were consistent with the grainy CCTV image of a car at the scene of the McDonnell murder. We were subsequently asked to identify probabilistic issues relating to all aspects of the evidence,  producing reports totalling several hundreds of pages.  Although these reports are not public, some material we subsequently wrote that mentions the case can be found in this publication. Having watched the programme I think it is worth making the following points:

The CCTV image of the car at the scene of the McDonnell murder: none of the letters or numbers on the number plate were clearly visible. A number of image experts provided (contradictory) conclusions about which characters could be ruled out in each position, so there was much uncertainty about how many number plates needed to be investigated; additionally at least two of the experts had been subject to confirmation bias because - instead of being presented with the grainy CCTV image and asked to say what the number plate could be, they were shown Bellfield's actual number plate and asked if the image was a possible match (as a result our colleague Itiel Dror was co-opted as an expert witness in the area of confirmation bias). The prosecution claimed to have 'eliminated' all possible vehicles with 'matching' number plates other than Bellfield's. This was important because, if true, it represented the most solid piece of evidence against Bellfield in the entire case. However, taking account of the uncertainty of the image expert assertions, we concluded that potentially thousands of additional vehicles would need to be eliminated. 

Lack of hard evidence: The dramatisation was correct in showing that, although there was much circumstantial evidence linking Bellfield to the murder of Delagrange, there was no direct evidence in the form of either forensic evidence or eyewitnesses to the crime. Hence, DCI Sutton's strategy was to link Bellfield to a number of  other 'similar' crimes that had taken place within the same area. The programme focused on two of the four for which he was charged, namely the McDonnell murder and one other attempted murder, for which the programme used a made-up name "Sarah" (the credits make clear that some names were deliberately changed). The "Sarah" case actually refers to Kate Sheedy who was deliberately run over with a car. The other two cases of attempted murder (which I will refer to as R and D) were not covered. Again (as the dramatization suggested) there was no direct evidence linking Bellfield to either the McDonnell or "Sarah" attacks, but much circumstantial evidence. By providing circumstantial evidence linking Bellfield to five crimes which were claimed to be 'very similar', DCI Sutton was able to ensure that Bellfield was charged with the Delagrange murder.

Linking of the five 'very similar' crimes: This linkage became the thrust of the prosecution case against Bellfield. What the prosecution essentially argued was that the crimes were so similar, and the circumstantial evidence against Bellfield so compelling in each case, that (in the words of the prosecuting barrister) "the chances that these offences were committed by anyone other than Bellfield are so fanciful that you can reject them". But in reality there was no great 'similarity' between the crimes: even in the dramatization DCI Sutton states somewhat ironically (about the Amelie, Marsha, and "Sarah" attacks) that "they all involved striking the victim with a blunt instrument - as we can consider a car a blunt instrument". Much of the defence case was based around exposing the probabilistic and logical fallacies arising from assumptions of similarity (although interestingly it was much later that we formalized some of these issues). With regards to the whole issue of 'cross admissibility' in one report I wrote the following generic statement:
The cross admissibility argument is based on the following valid probabilistic reasoning:

· Suppose Crime A and Crime B are so similar that it there is a very high probability they have been committed by the same person.

· If there is evidence to support the hypothesis that the defendant is guilty of Crime A then this automatically significantly increases the probability of him being guilty of Crime B, even without any evidence of Crime B.

48. In other words what is happening here is that the probability of guilt in Crime A, together with the evidence of similarity between the two crimes, makes it allowable to conclude that the probability of guilt in crime B has increased. This is indeed provably correct, but what the prosecution claims is something subtly different, namely:

It is perfectly allowable to use the probability of guilt in Crime A, as evidence for Crime B.

49. This subtle difference leads to a fallacy in the following scenario that is relevant to this case.

· Suppose that there are three Crime A, B and C. Suppose that the evidence that crimes B and C are similar is strong. Then as above, any evidence that indicates guilt in the case of crime B will, because of the evidence of similarity, impact on the probability of guilt for crime C. However, suppose that we have not yet heard any evidence on crimes B and C and suppose that there is no evidence that Crime A is similar to either Crime B or C.

· If there is strong evidence supporting probability of guilt in crime A, then, contrary to the prosecution claim, this evidence does not impact on the probability of guilt for either crimes B or C and hence should not be used as evidence as suggested in point 48 above.

· In fact in this scenario the evidence concerning crime A should, in relation to crimes B and C, be treated just the same as ‘previous conviction’ information in normal trials.

50. Given that the judge has allowed ‘cross admissibility’ of all 5 cases the danger identified in point 49 presents an opportunity for strategic exploitation by the prosecution. Specifically, the opportunistic strategy is to focus on an offence in which there is most hard evidence, even if that is the least serious offence and even if it bears the least similarity to the others. The prosecution can then argue that evidence of guilt in that case can be taken as evidence of guilt in the more serious cases. The jury would not necessarily be aware of the underlying fallacy. 
With hindsight point 50 is especially pertinent because, in contrast to the DeLagrane, McDonnell and "Sarah" cases, there actually was some direct evidence linking Bellfield to the R and D attacks  (neither of which resulted in serious injury to the victims) and there were few similarities between these and the other three cases.  The jury were allowed by the cross admissibility ruling and (in my view the incorrect) assumption of similarity to use evidence in the R and D attacks as evidence in the other cases. Interestingly, the Jury did not find Bellfield guilty of either of the R or D attacks.

Multiple probabilistic fallacies: In one of my summary reports I said (about the prosecution case generally): "There are several important instances of well known probabilistic fallacies (and also well known logical fallacies) that consistently exaggerate the impact of the evidence in favour of the prosecution case". In addition to the cross admissibility 'fallacy' we found examples of the following in the prosecution opening statement:
  • Prosecutors fallacy
  • Base rate neglect fallacy
  • Dependent evidence fallacy
  • Logically dependent evidence fallacy
  • Conjunction fallacy
  • Confirmation bias fallacy
  • Previous convictions fallacy
  • Coincidence fallacy
  • Minimal utility evidence fallacy
  • Lack of hard evidence fallacy
  • “Crimewatch UK” fallacy
These fallacies are all covered in our book and some (in the context of the Bellfield case) are covered in this paper.

And finally: In one scene in the programme DCI Sutton pointed out that he, Bellfield and Bellfield's lawyer all had one thing in common - being Spurs fans. Count me in on that one too...


Monday 14 January 2019

New research published in IEEE Transactions makes building accurate Bayesian networks easier

(This is an update of a previous posting)
One of the biggest practical challenges in building Bayesian network (BN) models for decision support and risk assessment is to define the probability tables for nodes with multiple parents. Consider the following example:
In any given week a terrorist organisation may or may not carry out an attack. There are several independent cells in this organisation for which it may be possible in any week to determine heightened activity. If it is known that there is no heightened activity in any of the cells, then an attack is unlikely. However, for any cell if it is known there is heightened activity then there is a chance an attack will take place. The more cells known to have heightened activity the more likely an attack is.
In the case where there are three terrorist cells, it seems reasonable to assume the BN structure here:

To define the probability table for the node "Attack carried out" we have to define probability values for each possible combination of the states of the parent nodes, i.e., for all the entries of the following table.

That is 16 values (although, since the columns must sum to one we only really have to define 8).
When data are sparse - as in examples like this - we must rely on judgment from domain experts to elicit these values. Even for a very small example like this, such elicitation is known to be highly error-prone. When there are more parents (imagine there are 20 different terrorist cells) or more states other than "False" and "True", then it becomes practically infeasible.  Numerous methods have been proposed to simplify the problem of eliciting such probability tables. One of the most popular methods - “noisy-OR”- approximates the required relationship in many real-world situations like the above example. BN tools like AgenaRisk implement the noisy-OR function making it easy to define even very large probability tables. However, it turns out that in situations where the child node (in the example this is the node "Attack carried out") is observed to be "False", the noisy-OR function fails to properly capture the real world implications. It is this weakness that is both clarified and resolved in the following two new papers published in IEEE Transactions on Knowledge and Data Engineering (both are open access so you can download the full pdf).

The first paper shows that by changing a single column of the probability table generated from the noisy-OR function (namely the last column where all parents are "True") most (but not all) of the deficiencies in noisy-OR are resolved.The second paper shows how the problem is resolved by defining the nodes as 'ranked nodes' and using the weighted average function in AgenaRisk.

Hence, while the first paper provides a simple approximate solutio, the second provides a 'complete solution' but requires software like AgenaRisk for its implementation,

Acknowledgements: The research was supported by the European Research Council under project, ERC-2013-AdG339182 (BAYES_KNOWLEDGE); the Leverhulme Trust under Grant RPG-2016-118 CAUSAL-DYNAMICS; Intelligence Advanced Research Projects Activity (IARPA), to the BARD project (Bayesian Reasoning via Delphi) of the CREATE programme under Contract [2017-16122000003]. and Agena Ltd for software support. We also acknowledge the helpful recommendations and comments of Judea Pearl, and the valuable contributions of David Lagnado (UCL) and Nicole Cruz (Birkbeck).

Wednesday 2 January 2019

New paper shows how and why important evidence is ignored in medicine, forensics and the law

Consider the following problem:
There is a diagnostic screening test for a particular serious disease which has a 90% chance of testing positive if the patient has the disease. However, this test also has a 90% chance of testing positive for a common benign condition. As the test cannot distinguish between whether or not the person has the serious or benign condition, can we disregard the evidence of the positive test result?  
An important new paper by Toby Pilditch and colleagues (at UCL and Queen Mary) published today in the journal Psychological Science demonstrates that people assume that such evidence can be disregarded. Specifically, they assume that - as it is equally predicted by two competing hypotheses (in this case serious disease versus benign) - it offers no support for either hypothesis. However, this assumption is wrong. It only holds when the 'competing' hypotheses are mutually exclusive and exhaustive (i.e. exactly one is true). In the above example, if both the serious disease and the benign condition are equally likely (say, a  5% chance) in a random member of the population then the positive test result increases the probability of BOTH the serious disease and the benign condition to about 25% (assuming a 10% false positive rate for the test). The paper shows that this reasoning error is due to a 'zero-sum' perspective on evidence, wherein people wrongly assume that evidence which supports one causal hypothesis must disconfirm its competitor. Across three experiments the paper demonstrates this error is robust to intervention and generalizes across several different contexts. The paper also rules out several alternative explanations of the bias.

The implications of this work are profound, as the fallacy is made in many critical areas of decision-making including law and forensics as well as medicine. For example, in 2001 Barry George was convicted of the shooting of Jill Dando, a TV celebrity, outside her flat in broad daylight. The main evidence against him was a single particle of firearm discharge residue (FDR) found in his coat pocket. In 2007 the Appeal Court concluded that the FDR evidence was not ‘probative’ in favour of guilt, because, contrary to what had been suggested in the original trial, it was equally likely to have arisen due to poor police procedures (such as the coat being exposed to FDR during police handling) as from him having fired the gun that killed Dando. Hence, his conviction was quashed and a re-trial ordered, in which Barry George was set free. However, the appeal court argument assumed that if a piece of evidence (the FDR in the coat pocket) is equally probable under two alternative hypotheses (Barry George fired gun vs poor police handling of evidence) then it cannot support either of these hypotheses. But it is not necessarily the case that  exactly one of these two hypotheses is true; it is possible that Barry George fired the gun and there was poor police handling of the evidence; and also that neither were true (e.g., the FDR particle came from elsewhere). Therefore, rather than being neutral, the FDR evidence may have been probative against Barry George (albeit weakly). The FDR evidence does not discriminate ‘Barry George fired the gun’ versus ‘poor police handling of evidence’, but it does discriminate ‘Barry George fired the gun’ from ‘Barry George did not fire the gun’: it is the latter hypothesis pair that was the target in this criminal investigation.

I have personally been involved in cases where defence evidence has also been wrongly deemed irrelevant because of the zero-sum fallacy. In particular, this happens when DNA from a crime scene does NOT match the defendant. The defence lawyer argues that this supports the hypothesis that the defendant was not at the crime scene. However, the prosecution and forensic experts argue (wrongly) that the lack of a match can be disregarded as this is equally likely to be the result of failure to collect a sufficient relevant sample of DNA from the crime scene.

The research was based upon work undertaken in the BARD project which was concerned with improving intelligence analysis with uncertain evidence using Bayesian networks. It was supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), under Contract [2017-16122000003].

The full reference:
Pilditch, T., Fenton, N. E., & Lagnado, D. A. (2019). "The zero-sum fallacy in evidence evaluation". Psychological Science,
Pdf of the accepted version.
Related links: