Thursday, 17 January 2019

Manhunt: the Levi Bellfield case from a probabilistic perspective

The ITV 3-part Series "Manhunt" starring Martin Clunes tells the story of the search for the killer of Amelie Delagrange who was murdered in Twickenham in 2004. It is based on the book by Colin Sutton (who was the detective in charge of the case) and dramatises his fight to find Amelie's killer, Levi Bellfield, who was also charged with the murder of Marsha McDonnell and three other attempted murders. The trial of Bellfield for these five crimes took place at the Old Bailey in 2008. He was convicted of the two murders and one of the three attempted murders (he was also later charged and convicted of the murder of Milly Dowler).

I declare a particular interest here because, between 2007-8, I (along with colleague Martin Neil) acted as an expert consultant to the Defence team in the case against Bellfield for the five crimes. We were initially asked to provide a statistical analysis relating to the number of car number plates that were consistent with the grainy CCTV image of a car at the scene of the McDonnell murder. We were subsequently asked to identify probabilistic issues relating to all aspects of the evidence,  producing reports totalling several hundreds of pages.  Although these reports are not public, some material we subsequently wrote that mentions the case can be found in this publication. Having watched the programme I think it is worth making the following points:

The CCTV image of the car at the scene of the McDonnell murder: none of the letters or numbers on the number plate were clearly visible. A number of image experts provided (contradictory) conclusions about which characters could be ruled out in each position, so there was much uncertainty about how many number plates needed to be investigated; additionally at least two of the experts had been subject to confirmation bias because - instead of being presented with the grainy CCTV image and asked to say what the number plate could be, they were shown Bellfield's actual number plate and asked if the image was a possible match (as a result our colleague Itiel Dror was co-opted as an expert witness in the area of confirmation bias). The prosecution claimed to have 'eliminated' all possible vehicles with 'matching' number plates other than Bellfield's. This was important because, if true, it represented the most solid piece of evidence against Bellfield in the entire case. However, taking account of the uncertainty of the image expert assertions, we concluded that potentially thousands of additional vehicles would need to be eliminated. 

Lack of hard evidence: The dramatisation was correct in showing that, although there was much circumstantial evidence linking Bellfield to the murder of Delagrange, there was no direct evidence in the form of either forensic evidence or eyewitnesses to the crime. Hence, DCI Sutton's strategy was to link Bellfield to a number of  other 'similar' crimes that had taken place within the same area. The programme focused on two of the four for which he was charged, namely the McDonnell murder and one other attempted murder, for which the programme used a made-up name "Sarah" (the credits make clear that some names were deliberately changed). The "Sarah" case actually refers to Kate Sheedy who was deliberately run over with a car. The other two cases of attempted murder (which I will refer to as R and D) were not covered. Again (as the dramatization suggested) there was no direct evidence linking Bellfield to either the McDonnell or "Sarah" attacks, but much circumstantial evidence. By providing circumstantial evidence linking Bellfield to five crimes which were claimed to be 'very similar', DCI Sutton was able to ensure that Bellfield was charged with the Delagrange murder.

Linking of the five 'very similar' crimes: This linkage became the thrust of the prosecution case against Bellfield. What the prosecution essentially argued was that the crimes were so similar, and the circumstantial evidence against Bellfield so compelling in each case, that (in the words of the prosecuting barrister) "the chances that these offences were committed by anyone other than Bellfield are so fanciful that you can reject them". But in reality there was no great 'similarity' between the crimes: even in the dramatization DCI Sutton states somewhat ironically (about the Amelie, Marsha, and "Sarah" attacks) that "they all involved striking the victim with a blunt instrument - as we can consider a car a blunt instrument". Much of the defence case was based around exposing the probabilistic and logical fallacies arising from assumptions of similarity (although interestingly it was much later that we formalized some of these issues). With regards to the whole issue of 'cross admissibility' in one report I wrote the following generic statement:
The cross admissibility argument is based on the following valid probabilistic reasoning:

· Suppose Crime A and Crime B are so similar that it there is a very high probability they have been committed by the same person.

· If there is evidence to support the hypothesis that the defendant is guilty of Crime A then this automatically significantly increases the probability of him being guilty of Crime B, even without any evidence of Crime B.

48. In other words what is happening here is that the probability of guilt in Crime A, together with the evidence of similarity between the two crimes, makes it allowable to conclude that the probability of guilt in crime B has increased. This is indeed provably correct, but what the prosecution claims is something subtly different, namely:

It is perfectly allowable to use the probability of guilt in Crime A, as evidence for Crime B.

49. This subtle difference leads to a fallacy in the following scenario that is relevant to this case.

· Suppose that there are three Crime A, B and C. Suppose that the evidence that crimes B and C are similar is strong. Then as above, any evidence that indicates guilt in the case of crime B will, because of the evidence of similarity, impact on the probability of guilt for crime C. However, suppose that we have not yet heard any evidence on crimes B and C and suppose that there is no evidence that Crime A is similar to either Crime B or C.

· If there is strong evidence supporting probability of guilt in crime A, then, contrary to the prosecution claim, this evidence does not impact on the probability of guilt for either crimes B or C and hence should not be used as evidence as suggested in point 48 above.

· In fact in this scenario the evidence concerning crime A should, in relation to crimes B and C, be treated just the same as ‘previous conviction’ information in normal trials.

50. Given that the judge has allowed ‘cross admissibility’ of all 5 cases the danger identified in point 49 presents an opportunity for strategic exploitation by the prosecution. Specifically, the opportunistic strategy is to focus on an offence in which there is most hard evidence, even if that is the least serious offence and even if it bears the least similarity to the others. The prosecution can then argue that evidence of guilt in that case can be taken as evidence of guilt in the more serious cases. The jury would not necessarily be aware of the underlying fallacy. 
With hindsight point 50 is especially pertinent because, in contrast to the DeLagrane, McDonnell and "Sarah" cases, there actually was some direct evidence linking Bellfield to the R and D attacks  (neither of which resulted in serious injury to the victims) and there were few similarities between these and the other three cases.  The jury were allowed by the cross admissibility ruling and (in my view the incorrect) assumption of similarity to use evidence in the R and D attacks as evidence in the other cases. Interestingly, the Jury did not find Bellfield guilty of either of the R or D attacks.

Multiple probabilistic fallacies: In one of my summary reports I said (about the prosecution case generally): "There are several important instances of well known probabilistic fallacies (and also well known logical fallacies) that consistently exaggerate the impact of the evidence in favour of the prosecution case". In addition to the cross admissibility 'fallacy' we found examples of the following in the prosecution opening statement:
  • Prosecutors fallacy
  • Base rate neglect fallacy
  • Dependent evidence fallacy
  • Logically dependent evidence fallacy
  • Conjunction fallacy
  • Confirmation bias fallacy
  • Previous convictions fallacy
  • Coincidence fallacy
  • Minimal utility evidence fallacy
  • Lack of hard evidence fallacy
  • “Crimewatch UK” fallacy
These fallacies are all covered in our book and some (in the context of the Bellfield case) are covered in this paper.

And finally: In one scene in the programme DCI Sutton pointed out that he, Bellfield and Bellfield's lawyer all had one thing in common - being Spurs fans. Count me in on that one too...


Monday, 14 January 2019

New research published in IEEE Transactions makes building accurate Bayesian networks easier

(This is an update of a previous posting)
One of the biggest practical challenges in building Bayesian network (BN) models for decision support and risk assessment is to define the probability tables for nodes with multiple parents. Consider the following example:
In any given week a terrorist organisation may or may not carry out an attack. There are several independent cells in this organisation for which it may be possible in any week to determine heightened activity. If it is known that there is no heightened activity in any of the cells, then an attack is unlikely. However, for any cell if it is known there is heightened activity then there is a chance an attack will take place. The more cells known to have heightened activity the more likely an attack is.
In the case where there are three terrorist cells, it seems reasonable to assume the BN structure here:

To define the probability table for the node "Attack carried out" we have to define probability values for each possible combination of the states of the parent nodes, i.e., for all the entries of the following table.

That is 16 values (although, since the columns must sum to one we only really have to define 8).
When data are sparse - as in examples like this - we must rely on judgment from domain experts to elicit these values. Even for a very small example like this, such elicitation is known to be highly error-prone. When there are more parents (imagine there are 20 different terrorist cells) or more states other than "False" and "True", then it becomes practically infeasible.  Numerous methods have been proposed to simplify the problem of eliciting such probability tables. One of the most popular methods - “noisy-OR”- approximates the required relationship in many real-world situations like the above example. BN tools like AgenaRisk implement the noisy-OR function making it easy to define even very large probability tables. However, it turns out that in situations where the child node (in the example this is the node "Attack carried out") is observed to be "False", the noisy-OR function fails to properly capture the real world implications. It is this weakness that is both clarified and resolved in the following two new papers published in IEEE Transactions on Knowledge and Data Engineering (both are open access so you can download the full pdf).

The first paper shows that by changing a single column of the probability table generated from the noisy-OR function (namely the last column where all parents are "True") most (but not all) of the deficiencies in noisy-OR are resolved.The second paper shows how the problem is resolved by defining the nodes as 'ranked nodes' and using the weighted average function in AgenaRisk.

Hence, while the first paper provides a simple approximate solutio, the second provides a 'complete solution' but requires software like AgenaRisk for its implementation,

Acknowledgements: The research was supported by the European Research Council under project, ERC-2013-AdG339182 (BAYES_KNOWLEDGE); the Leverhulme Trust under Grant RPG-2016-118 CAUSAL-DYNAMICS; Intelligence Advanced Research Projects Activity (IARPA), to the BARD project (Bayesian Reasoning via Delphi) of the CREATE programme under Contract [2017-16122000003]. and Agena Ltd for software support. We also acknowledge the helpful recommendations and comments of Judea Pearl, and the valuable contributions of David Lagnado (UCL) and Nicole Cruz (Birkbeck).

Wednesday, 2 January 2019

New paper shows how and why important evidence is ignored in medicine, forensics and the law

Consider the following problem:
There is a diagnostic screening test for a particular serious disease which has a 90% chance of testing positive if the patient has the disease. However, this test also has a 90% chance of testing positive for a common benign condition. As the test cannot distinguish between whether or not the person has the serious or benign condition, can we disregard the evidence of the positive test result?  
An important new paper by Toby Pilditch and colleagues (at UCL and Queen Mary) published today in the journal Psychological Science demonstrates that people assume that such evidence can be disregarded. Specifically, they assume that - as it is equally predicted by two competing hypotheses (in this case serious disease versus benign) - it offers no support for either hypothesis. However, this assumption is wrong. It only holds when the 'competing' hypotheses are mutually exclusive and exhaustive (i.e. exactly one is true). In the above example, if both the serious disease and the benign condition are equally likely (say, a  5% chance) in a random member of the population then the positive test result increases the probability of BOTH the serious disease and the benign condition to about 25% (assuming a 10% false positive rate for the test). The paper shows that this reasoning error is due to a 'zero-sum' perspective on evidence, wherein people wrongly assume that evidence which supports one causal hypothesis must disconfirm its competitor. Across three experiments the paper demonstrates this error is robust to intervention and generalizes across several different contexts. The paper also rules out several alternative explanations of the bias.

The implications of this work are profound, as the fallacy is made in many critical areas of decision-making including law and forensics as well as medicine. For example, in 2001 Barry George was convicted of the shooting of Jill Dando, a TV celebrity, outside her flat in broad daylight. The main evidence against him was a single particle of firearm discharge residue (FDR) found in his coat pocket. In 2007 the Appeal Court concluded that the FDR evidence was not ‘probative’ in favour of guilt, because, contrary to what had been suggested in the original trial, it was equally likely to have arisen due to poor police procedures (such as the coat being exposed to FDR during police handling) as from him having fired the gun that killed Dando. Hence, his conviction was quashed and a re-trial ordered, in which Barry George was set free. However, the appeal court argument assumed that if a piece of evidence (the FDR in the coat pocket) is equally probable under two alternative hypotheses (Barry George fired gun vs poor police handling of evidence) then it cannot support either of these hypotheses. But it is not necessarily the case that  exactly one of these two hypotheses is true; it is possible that Barry George fired the gun and there was poor police handling of the evidence; and also that neither were true (e.g., the FDR particle came from elsewhere). Therefore, rather than being neutral, the FDR evidence may have been probative against Barry George (albeit weakly). The FDR evidence does not discriminate ‘Barry George fired the gun’ versus ‘poor police handling of evidence’, but it does discriminate ‘Barry George fired the gun’ from ‘Barry George did not fire the gun’: it is the latter hypothesis pair that was the target in this criminal investigation.

I have personally been involved in cases where defence evidence has also been wrongly deemed irrelevant because of the zero-sum fallacy. In particular, this happens when DNA from a crime scene does NOT match the defendant. The defence lawyer argues that this supports the hypothesis that the defendant was not at the crime scene. However, the prosecution and forensic experts argue (wrongly) that the lack of a match can be disregarded as this is equally likely to be the result of failure to collect a sufficient relevant sample of DNA from the crime scene.

The research was based upon work undertaken in the BARD project which was concerned with improving intelligence analysis with uncertain evidence using Bayesian networks. It was supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), under Contract [2017-16122000003].

The full reference:
Pilditch, T., Fenton, N. E., & Lagnado, D. A. (2019). "The zero-sum fallacy in evidence evaluation". Psychological Science,
Pdf of the accepted version.
Related links:

Monday, 24 December 2018

Bayesian network approach to Drug Economics Decision Making

This is an update of a short paper I first produced in 2014.

Consider the following problem:
A relatively cheap drug (drug A) has been used for many years to treat patients with disease X. The drug is considered quite successful since data reveals that 85% of patients using it have a ‘good outcome’ which means they survive for at least 2 years. The drug is cheap and the overall “financial benefit” of the drug (which assumes a ‘good outcome’ is worth $5000 and is defined as this figure minus the cost) has a mean of $4985.

There is an alternative drug (drug B) that a number of specialists in disease X strongly recommend. However, the data reveals that only 65% of patients using drug B survive for at least 2 years. Moreover, this drug is expensive. The overall “financial benefit” of the drug has a mean of just $2777.
On seeing the data the Health Authority recommends a ban against the use of drug B. Is this a rational decision?

The answer turns out to be no. This short paper explains this using a simple Bayesian network model that you can run (by downloading the free copy of AgenaRisk). Moreover, you can also compute the optimal decision automatically using the Hybrid Influence Diagram tool in AgenaRisk.

Fenton N.E. (2018) "A Bayesian Network and Influence Diagram for a simple example of Drug Economics Decision Making",  DOI:

Thursday, 20 December 2018

Review of “The Book of Why" by Pearl and Mackenzie

Judea Pearl and Dana Mackenzie: “The Book of Why: The New Science of Cause and Effect”, Basic Books, 2018. ISBN: 9780465097609
We have finally completed a detailed review of this important and outstanding book - the review will hopefully be published in the journal Artificial Intelligence. But a preprint of the full review is now available.

Some excerpts from the review:
  • Judea Pearl, a Turing Award prize winner, is a true giant of the field of computer science and artificial intelligence. The Turing award is the highest distinction in computer science; i.e., the Nobel Prize of computing. To say that his new book with Dana Mackenzie is timely is, in our view, an understatement. Coming from somebody of his stature and being written for a general audience (unlike his previous books), means that the concerns we have held about both the limitations of solely data driven approaches to artificial intelligence (AI) and the need for a causal approach, will finally reach a very broad audience.
  • According to Pearl, the state of the art in AI today is merely a ‘souped-up’ version of what machines could already do a generation ago: find hidden regularities in a large set of data. “All the impressive achievements of deep learning amount to just curve fitting”, he said recently. 
  • In Chapter 1, the core message about the need for causal models is underpinned by what Pearl calls “The Ladder of Causation”, which is then used to orient the ideas presented throughout the book. Pearl’s ladder of causation suggests that there are three steps to achieving true AI. .... Pearl also characterises these three steps on the ladder as 1) ‘seeing’; 2) ‘doing’; and 3) ‘imagining’. 
  • One of the reasons ‘deep learning’ has been so successful is that many problems can be solved by optimisation alone without the need to even consider advancing to rungs in the ladder of causation beyond the first. These problems include machine vision and machine listening, natural language processing, robot navigation, as well as other problems that fall within the areas of clustering, pattern recognition and anomaly detection. Big data in these cases is clearly very important and the advances being made using deep learning are undoubtedly impressive, but Pearl convincingly argues that they are not AI.
  • There is much excellent material in this book but, for us, the two key messages are: 1) “True AI” cannot be achieved by data and curve fitting alone, since causal representation of the underlying problems is also required to answer “what-if” questions, and 2) Randomized control trials are not the only ‘valid’ method for determining causal effects.
Norman Fenton, Martin Neil, and Anthony Constantinou, 20 December, 2018

For the full review see:
Review of: Judea Pearl and Dana Mackenzie: “The Book of Why: The New Science of Cause and Effect”, Basic Books, 2018 DOI:, by Norman Fenton, Martin Neil, and Anthony Constantinou

Wednesday, 5 December 2018

The case of the Kandinsky painting and Bayes' theorem

During World War 2 many thousands of pieces of valuable artwork were stolen from Jewish families by the Nazis and their collaborators in countries they occupied. The 2015 film The Woman in Gold (with Helen Mirren) told the story of one such painting by Klimt and the family's long fight to regain ownership. There have been many similar stories and the latest one concerns the "Painting with Houses" (Bild mit Hausern) by Wassily Kandinsky as described in today's article in the Guardian and in this New York Times article. I have become personally involved in this case as an expert consultant - on Bayes' theorem, not art.

"Painting with Houses" (Bild mit Hausern) by Wassily Kandinsk (1909)
The painting is in the collection of the Stedelijk Museum in Amsterdam, but before the war it was owned by the Lewenstein family of Amsterdam having been bought by Emanuel Lewenstein who was an art collector.  For works like this of “possibly problematic provenance” in Holland, there is a Dutch Restitution Committee (DRC) that is empowered to make binding decisions about ownership.  In October 2018 the DRC surprisingly determined that it was 'not obliged to restitute the painting' to the Lewenstein family.

James Palmer of Mondex Corporation (Canada), who represents the Lewenstein heirs, believes that the ruling was both logically and probabilistically flawed and that it was designed, from the very beginning, to refuse to restitute the painting to the Lewenstein family. Knowing that Bayes theorem could be used where only subjective probabilities were available, James contacted me to provide an analysis of the DRC decision. Here is my short report. I used a causal Bayesian network model to determine that the DRC argument is extremely unlikely to be valid. Specifically, with very basic assumptions that I suspect will turn out to be favourable to the DRC, the probability that their claim is 'true' is about 3%.  Hence, the decision unfairly robs the Lewenstein heirs of what is rightfully theirs. My involvement in the case is described in an article in the leading Dutch newspaper NRC:

From the article about the case in the Dutch newspaper NRC

Fenton, N. E. "The case of the Kandinsky painting and Bayes' theorem", Nov 2018, DOI: 10.13140/RG.2.2.29551.48804

See also 

Thursday, 29 November 2018

AI for healthcare requires ‘smart data’ rather than ‘big data’

Norman Fenton gave a talk titled AI for healthcare requires ‘smart data’ rather than ‘big data’ to medics at the Royal London Hospital on 27 November. He explained the background and context for the PAMBAYESIAN project.

Norman's Powerpoint presentation