Probability and Risk: 2019

Wednesday 11 December 2019

Problems with DNA mixed profile evidence: the case of Florencio Jose Dominguez

I have written many times before about the potential problems when using the likelihood ratio (LR) as a measure of probative value of evidence. The problems are especially acute when the evidence consists of a tiny sample of DNA for which there are at least two people contributing - often referred to as a low template mixed DNA profile. Over the last year I have been working with lawyer Matthew Speradelozzi on a case in San Diego that challenged the use of new statistical analyses for such a mixed profile. The case was settled Friday when Florencio Jose Dominguez (who was sentenced to 50 years to life for a 2008 murder) was released after pleading guilty to a reduced charge.

The major controversy involves what is called probabilistic genotyping software (in this case STRmix from ESR) that claims to be able to analyse low template mixtures and determine the most likely contributing profiles by taking account of information like the relative peak heights at loci on the electropherogram (epg), which is the graph that DNA analysts use to decide which components (alleles) are present in a sample. The DNA analysts first determine the number of contributors there are in the mixture and then provide a LR that compares the probability of the evidence assuming the suspect is one of the contributors against the probability of the evidence assuming the none of the contributors are related to the suspect. While the probabilistic genotyping software can be effective if the ‘size’ of the different contributors is very different, it is much less effective when it is not (as with Dominguez who was claimed to be one of at least two unknown contributors of a similar ‘size’). Moreover, in contrast to single profile DNA cases, where the only residual uncertainty is whether a person other than the suspect has the same matching DNA profile, it is possible for all the genotypes of the suspect’s DNA profile to appear at each locus of a DNA mixture, even though none of the contributors has that DNA profile. In fact, in the absence of other evidence, it is possible to have a very high LR for the hypothesis ‘suspect is included in the mixture’ even though the posterior probability that the suspect is included is very low. Yet, in such cases a forensic expert will generally still report a high LR as ‘strong support for the suspect being a contributor’, which is potentially highly misleading. We have submitted a paper describing this and many other issues relating to the reliability of probabilistic genotyping software and will report on it here in due course.

ESR have issued their own statement.

See also

https://www.sandiegouniontribune.com/news/courts/story/2019-12-06/murder-case-that-highlighted-dna-analysis-controversy-ends-with-plea-to-reduced-charge-release

Friday 6 December 2019

Simpson's paradox again

Deepai.org have a post about our paper on Simpson's paradox (we wrote this in 2015 but only just uploaded it to arxiv). The full paper is here.

The paradox is covered extensively in both “The Book of Why" by Pearl and Mackenzie (see my review) and also David Spiegelhalter’s “The Art of Statistics: How to Learn from Data”(see my review). Speigelhalter's book contains a particularly good example of Cambridge University admissions data:

Overall the acceptance rate was higher for men and than women, but in each subject the rate was higher for women than men. This is explained by the observation that women were more likely to apply for those subjects where the overall accepance rates were lower. In other words the relevant causal model is this one:

Monday 25 November 2019

Bayesian networks for cybersecurity risk

Our new paper describing a Bayesian network approach to cybsersecurity (with lead author PhD student Jiali Wang) has been published in Computers & Security.

The print version will appear Feb 2020, but the online version is available now: https://doi.org/10.1016/j.cose.2019.101659

An open access pre-publication version is also available for download.

Lead author: Jiali Wang

Monday 7 October 2019

Bayesian networks research on treating injured soldiers gains DoD funding

The research which this new US DoD funding supports is the continuation of a long term collaboration between the RIM (Risk and Information Management) Group at Queen Mary (with William Marsh taking the lead) and the Trauma Sciences Centre led by surgeon Col Nigel Tai.

The underlying AI decision support is provided by causal Bayesian Networks. Two of the previous models can be accessed and run online at www.traumamodels.com

Institute of Applied Data Science seminar: "Why machine learning from big data fails"

On 3 October Norman Fenton gave a seminar: "Why machine learning from big data fails – and what to do about it", at the Institute for Applied Data Science, Queen Mary University. Here are the powerpoint slides for his presentation.

Tuesday 24 September 2019

Naked Statistical Evidence

Consider the hypothetical scenario:

All 100 prisoners in a prison participate in a riot, and 99 of them participate in attacking and killing a guard (the other returned to his cell briefly after the riot). With the guard dead, all 100 prisoners then escape. The next day one of the prisoners is captured and charged with participating in the murder of the guard. While admitting to participating in the riot the prisoner claims that he was the one who was not involved in attacking the guard. In the absence of any other evidence there is 99% probability the prisoner is guilty. Is this sufficient to convict?

Christian Dahlman

The latest episode of the evidence podcast "Excited Utterance" has an excellent interview with our colleague Christian Dahlman of Lund University about this kind of "naked statistical evidence", available on itunes, and also here:

https://www.excitedutterancepodcast.com/listen

Christian contrasts the above kind of naked statistical evidence with forensic evidence, such as a footprint found at a crime scene whose pattern 'matches' that of a shoe worn by the suspect. Whereas the causal link between the statistical evidence and guilt goes from the former to the latter, the causal link between the forensic evidence and guilt goes from the latter to the former:

This difference is central to the recent paper about the 'opportunity prior' that we co-authored with Christian. The fact that the suspect was at the prison means that he had the 'opportunity' to participate in the killing and that the prior probability for guilt given the naked statistical evidence is 99%.

Christian talks about his latest paper, and at the end of the interview (24:50), he defends the Bayesian approach to legal evidence against attacks from some legal scholars (this is something we also did in our recent paper on countering the ‘probabilistic paradoxes in legal reasoning’ with Bayesian networks).

References:

Dahlman, C. (2019). "Naked Statistical Evidence and Incentives for Lawful Conduct ", https://www.researchgate.net/publication/336011753_Naked_Statistical_Evidence_and_Incentives_for_Lawful_Conduct
Fenton, N. E., Lagnado, D. A., Dahlman, C., & Neil, M. (2019). "The Opportunity Prior: A proof-based prior for criminal cases", Law, Probability and Risk, DOI 10.1093/lpr/mgz007. Full paper from OUP. See also blog post
de Zoete, J., Fenton, N. E., Noguchi, T., & Lagnado, D. A. (2019). "Countering the ‘probabilistic paradoxes in legal reasoning’ with Bayesian networks". Science & Justice 59 (4), 367-379 10.1016/j.scijus.2019.03.003 The pre-publication version (pdf) See also blog post.

Sunday 8 September 2019

Book Review: Pat Wiltshire’s “Traces: The memoirs of a forensic scientist and criminal investigator”

Gripping, scientifically rigorous and moving memoir of the world’s leading forensic palynologist.

The quote on the back cover of this book says: “Nature will invariably give up her secrets to those of us who know where to look”. Pat Wiltshire, a truly ‘one of a kind’ forensic ecologist, is probably the most qualified person in the world when it comes to knowing where to look.

This book is both a (popular) science book and a personal life story. The science is a thorough introduction to multiple aspects of ecology (and notably palynology – the study of pollen and spores from plants and fungi) as well as a detailed description of the processes of forensic investigation and analysis. The personal story fully reveals how Pat become the person she is, including her motivations, regrets, and loves. The science and the memoirs are interwoven throughout the book and what links much of the narrative are the accounts of Pat’s forensic investigations that provide fascinating insights into a number of different crimes (including murders and rapes) that Pat has helped shed light on. There are also eight pages of colour photographs of Pat at most stages of her life in the middle of the book.

See the full review (on ResearchGate):
Book Review: Pat Wiltshire’s “Traces: The memoirs of a forensic scientist and criminal investigator”

Full pfd also available here: https://www.eecs.qmul.ac.uk/~norman/papers/Traces_Review.pd f

Note: There are different UK and US versions (with different titles). The US version has different grammar and no photographs, but unlike the UK version, the audio version is narrated by Pat

UK: “Traces: The memoirs of a forensic scientist and criminal investigator” Bonnier Books UK, 2019

USA: "Nature of Life and Death", Putnam House G P Putnam's Sons 2019

Tuesday 20 August 2019

Book Review: David Spiegelhalter’s “The Art of Statistics: How to Learn from Data”

A superb, timely overview of the benefits and limitations of statistics in the era of big data and machine learning

David Spiegelhalter has gained a deserved reputation as a masterful communicator of statistics and risk through his media work and writings. I believe this timely book is the best introduction to the benefits and limitations of statistics that I have seen and is David’s most important work yet in public communication. Any of the minor concerns explained in the review that I have about the book (including the understated role of causal models and the role of the likelihood ratio in courts) are the inevitable result of having to be selective about which more detailed material has to be left out to satisfy both the page and audience constraints.

In summary, this book is a must have for a) anybody who wants to better understand statistics and risk; b) anybody involved in the communication of statistics and risk; and c) anybody undertaking a course in data science and machine learning.

Here is the link to the full review:

Book Review: David Spiegelhalter’s “The Art of Statistics: How to Learn from Data” Pelican Books, 2019

See also:

Postscript: One of the slight concerns I discuss in the review related to the example of the use of statistics in identifying unusually poor hospital treatment outcomes. This is an example that I thought was crying out for a causal model along the lines of this:

Thursday 4 July 2019

Challenging claims that probability theory is incompatible with legal reasoning

The published version of our paper "Resolving the so-called 'probabilistic paradoxes in legal reasoning' with Bayesian networks" is available for free download courtesy of Elsevier until 16 Aug. This is the link: https://authors.elsevier.com/c/1ZIQf4q6IcgUdA

The previous blog posting about this article is here.

The full citation:

de Zoete, J., Fenton, N. E., Noguchi, T., & Lagnado, D. A. (2019). "Countering the ‘probabilistic paradoxes in legal reasoning’ with Bayesian networks". Science & Justice 59 (4), 367-379, 10.1016/j.scijus.2019.03.003

Friday 14 June 2019

Review of clinical practice guidelines for gestational diabetes

Gestational diabetes is the most common metabolic disorder of pregnancy, and it is important that well-written clinical practice guidelines (CPGs) are used to optimise healthcare delivery and improve patient outcomes. This paper published today in BMJ Open is a review of such hospital-based CPGs. Seven CPGs met the criteria for inclusion in the review. Only two of these were considered to be of acceptable quality (one was from the Canadian Diabetic Association and other from the Auckland DHB, New Zealand).

Full reference citation:

Daley, B., Hitman, G., Fenton, N.E., & McLachlan, S. (2019). "Assessment of the methodological quality of local clinical practice guidelines on the identification and management of gestational diabetes". BMJ Open, 9(6), e027285. https://doi.org/10.1136/bmjopen-2018-027285. Fullpaper (pdf)

The work was funded by EPSRC as part of the PAMBAYESIAN project

Wednesday 22 May 2019

Defining the dreaded 'prior probability of guilt' - a new paper that does just that

One of the greatest impediments to the use of probabilistic reasoning in legal arguments is the difficulty in agreeing on an appropriate prior probability that the defendant is guilty. The 'innocent until proven guilty' assumption technically means a prior probability of 0 - a figure that (by Bayesian reasoning) can never be overturned no matter how much evidence follows. Some have suggested the logical equivalent of 1/N where N is the number of people in the world. But this probability is clearly too low as N includes too many who could not physically have committed the crime. On the other hand the often suggested prior 0.5 is too high as it stacks the odds too much against the defendant.

Therefore, even strong supporters of a Bayesian approach seem to think they can and must ignore the need to consider a prior probability of guilt (indeed it is this thinking that explains the prominence of the 'likelihood ratio' approach discussed so often on this blog).

This new paper published online in the OUP journal Law, Probability and Risk (and which extends a previous paper presented at the 2017 International Conference on Artificial Intelligence and the Law) - shows that, in a large class of cases, it is possible to arrive at a realistic prior that is also as consistent as possible with the legal notion of ‘innocent until proven guilty’. The approach is based first on identifying the 'smallest' time and location from the actual crime scene within which the defendant was definitely present and then estimating the number of people - other than the suspect - who were also within this time/area. If there were n people in total, then before any other evidence is considered each person, including the suspect, has an equal prior probability 1/n of having carried out the crime.

The method applies to cases where we assume a crime has definitely taken place and that it was committed by one person against one other person (e.g. murder, assault, robbery). The work considers both the practical and legal implications of the approach and demonstrates how the prior probability is naturally incorporated into a generic Bayesian network model that allows us to integrate other evidence about the case.

Full details:

Fenton, N. E., Lagnado, D. A., Dahlman, C., & Neil, M. (2019). "The Opportunity Prior: A proof-based prior for criminal cases", Law, Probability and Risk, DOI 10.1093/lpr/mgz007

Monday 13 May 2019

When 'absence of forensic evidence' is not 'neutral'

It is widely accepted that ‘evidence of absence’ (such as an alibi confirming that the defendant was not at the crime scene) is not the same as ‘absence of evidence’ (such as where there is no evidence about whether or not the defendant was at the crime scene).

However, for forensic evidence, there is often confusion about these concepts. If DNA found at the crime scene does not match the defendant is that ‘evidence of absence’ or ‘absence of evidence’? It depends, of course, on the circumstances. If there is a high probability that the DNA found must have come from the person who committed the crime then this is clearly ‘evidence of absence’ - the fact that it does not match the defendant is highly probative in favour of the defence. On the other hand if the only DNA found at the crime scene is actually unrelated to the person who committed the crime, then this is clearly ‘absence of evidence’ – the fact that it does not match the defendant is no more probative for the defence than for the prosecution (so the evidence is ‘neutral’). The problem is that lawyers and forensic scientists often wrongly assume that absence ‘evidence of absence’ is ‘neutral’.

The full report (5 pages) includes a 'proof' (using a simple Bayesian network model) of how the experts get it wrong in a real example.

Fenton, N. E. (2019). When “absence of forensic evidence” is not “neutral.” https://doi.org/10.13140/RG.2.2.14517.73440

The Bayesian network model is available here. It can be run in the trial version of AgenaRisk

Wednesday 1 May 2019

House of Lords Report on Forensic Science and the Criminal Justice System

The House of Lords Report published today contains the following quote from me that was part of The Alan Turing Institute submission:

I said a lot more in the Turing submission about the use of probability and statistics in evidence, including concerns about low template DNA evidence and the possibility of using Bayesian networks to properly assess the overall impact of multiple pieces of related evidence.

Two other Queen Mary colleagues (Amber Marks and Ian Walden) also contributed to the Turing submission.

For full details see:

House of Lords, The Science and Technology Select Committee "Forensic science and the criminal justice system: a blueprint for change" HL Paper 333, 1 May 2019, https://t.co/M6utVY8Z0b
The Alan Turing Institute "Response to the House of Lords inquiry: Forensic Science in Criminal Justice", 13 September 2018, https://t.co/OBNeceVqhu
Fenton N.E, Neil M, Berger D, “Bayes and the Law”, Annual Review of Statistics and Its Application, Volume 3, 2016 (June), pp 51-77 http://dx.doi.org/10.1146/annurev-statistics-041715-033428. (This is cited in both of the above reports. See also blog posting about this article).

Sunday 31 March 2019

Modelling competing legal arguments using Bayesian networks

We have previously always tried to capture all of the competing hypotheses and evidence in a legal case in a single coherent Bayesian network model. But our new paper explains why this may not always be sensible and how to deal with it by using "competing" models. The full published version can be read here.

This work arose out of the highly successful Isaac Newton Institute Cambridge Programme on Probability and Statistics in Forensic Science.

Full reference:

Neil, M., Fenton, N. E., Lagnado, D. A. & Gill, R. (2019), "Modelling competing legal arguments using Bayesian Model Comparison and Averaging". Artificial Intelligence and Law https://doi.org/10.1007/s10506-019-09250-3 .The full published version can be read here

Saturday 16 March 2019

Hannah Fry’s “Hello World” and the Example of Algorithm Bias

“Hello World” is an excellent book by Hannah Fry that provides lay explanations about both the potential and threats of AI and machine learning algorithms in the modern world. It is filled with many excellent examples, and one that is especially important is in Chapter 3 (“Justice”) about the use of algorithms in the criminal justice system. The example demonstrates the extremely important point that there is an inevitable trade-off between ‘accuracy’ and ‘fairness’ when it comes to algorithms that make decisions about people.

While the overall thrust and conclusions of the example are correct the need to keep any detailed maths out of the book might leave careful readers unconvinced about whether the example really demonstrates the stated conclusions. I feel it is important to get the details right because the issue of algorithmic fairness is of increasing importance for the future of AI, yet is widely misunderstood.

I have therefore produced a short report that provides a fully worked explanation of the example. I explain what is missing from Hannah's presentation, namely any explicit calculation of the false positive rates of the algorithm. I show how Bayes theorem (and some other assumptions) are needed to compute the false positive rates for men and women. I also show why and how a causal model of the problem (namely a Bayesian network model) makes everything much clearer.

Fry, H. (2018). "Hello world : how to be human in the age of the machine". New York: W. W. Norton & Company, Inc.

My report:

Fenton, N E. (2019) "Hannah Fry’s 'Hello World' and the Example of Algorithm Bias", DOI 10.13140/RG.2.2.14339.55844

A pdf of the report is also available here

Thursday 14 March 2019

The Simonshaven murder case modelled as a Bayesian network

A paper published today in Topics in Cognitive Science is one in a series of analyses of a Dutch murder case, each using a different modelling approach. In this case a woman was murdered while out walking with her husband in a quiet recreational area near the village of Simonshaven, close to Rotterdam, in 2011. The trial court of Rotterdam convicted the victim’s husband of murder by intentionally hitting and/or kicking her in the head and strangling her. For the appeal the defence provided new evidence about other ‘similar’ murders in the area committed by a different person.

The idea to use this case to evaluate a number of different methods for modelling complex legal cases was originally proposed by Floris Bex (Utrecht), Anne Ruth Mackor (Groningen) and Henry Prakken (Utrecht). In September 2016 -as part of our Programme Probability and Statistics in Forensic Science at the Isaac Newton Institute Cambridge - a special two-day workshop was arranged in which different teams were presented with the Simonshaven evidence and had to produce a model analysis. At the time the Appeal was still to be heard. In a follow-up workshop to review the various solutions (held in London in June 2017 as part of the BAYES-KNOWLEDGE project) the participants agreed to publish their results in a special issue of a journal.

This paper describes the Bayesian Network (BN) team's solution. One of the key aims was to determine if a useful BN could be quickly constructed using the previously established idioms-based approach (this provides a generic method for translating legal cases into BNs). The BN model described was built by the authors during the course of the workshop. The total effort involved was approximately 26 hours (i.e. an average of 6 hours per author). With the basic assumptions described in the paper, the posterior probability of guilt once all the evidence is entered is 74%. The paper describes a formal evaluation of the model, using sensitivity analysis, to determine how robust the model conclusions are to key subjective prior probabilities over a full range of what may be deemed ‘reasonable’ from both defence and prosecution perspectives. The results show that the model is reasonably robust - pointing generally to a reasonably high posterior probability of guilt, but also generally below the 95% threshold expected in criminal law.

The authors acknowledge the insights of the following workshop participants: Floris Bex, Christian Dahlman, Richard Gill, Anne Ruth Mackor, Ronald Meester, Henry Prakken, Leila Schneps, Marjan Sjerps, Nadine Smit, Bart Verheij, and Jacob de Zoete.

Full reference:

Fenton, N. E., Neil, M., Yet, B., & Lagnado, D. A. (2019). "Analyzing the Simonshaven Case using Bayesian Networks". Topics in Cognitive Science, 10.1111/tops.12417. For those without a subscription to the journal, the published version can be read here: https://rdcu.be/bqYxp)

Monday 11 March 2019

Challenging claims that probability theory is incompatible with legal reasoning

A new paper published in Science and Justice exposes why common claims that probability theory is incompatible with the law are flawed.

One of the most effective tactics that has been used by legal scholars to 'demonstrate' the 'limitations' and 'incompatibility' of probability theory (and particularly Bayes theorem) with legal reasoning is the use of puzzles like the following:

Fred is charged with a crime. A reliable eye witness testifies that someone exactly matching Fred’s appearance was seen fleeing the crime scene. But Fred is known to have an identical twin brother. So is the evidence relevant?"**

The argument to suggest that this example demonstrates probability theory is incompatible with legal norms goes something like this:

Both intuitively and legally it is clear that the evidence should be considered relevant. But according to probability theory (Bayes' theorem), the evidence has 'no probative value' since it provides no change in our belief about whether Fred is more likely than his twin brother to have been at the crime scene. Hence, according to probability theory the evidence is wrongly considered inadmissible.

Specifically, such problems are intended to show that use of probability theory results in legal paradoxes. As such, these problems have been a powerful detriment to the use of probability theory in the law.

The new paper shows that all of these puzzles only lead to ‘paradoxes’ under an artificially constrained view of probability theory and the use of the so-called likelihood ratio, in which multiple related hypotheses and pieces of evidence are squeezed into a single hypothesis variable and a single evidence variable. When the distinct relevant hypotheses and evidence are described properly in a causal model (a Bayesian network), the paradoxes vanish. Moreover, the resulting Bayesian networks provide a powerful framework for legal reasoning.

Full reference details of the paper:

de Zoete, J., Fenton, N. E., Noguchi, T., & Lagnado, D. A. (2019). "Countering the ‘probabilistic paradoxes in legal reasoning’ with Bayesian networks". Science & Justice 10.1016/j.scijus.2019.03.003.

The pre-publication version (pdf)

The models (which can be run using AgenaRisk)

Two other papers just accepted (details to follow) also demonstrate the power of Bayesian networks in legal reasoning:

Fenton, N. E., Neil, M., Yet, B., & Lagnado, D. A. (2019). "Analyzing the Simonshaven Case using Bayesian Networks". Topics in Cognitive Science, 10.1111/tops.12417. (Update: this had now been published; the published version can be read https://rdcu.be/bqYxp )

Neil, M., Fenton, N. E., Lagnado, D. A. & Gill, R. (2019), "Modelling competing legal arguments using Bayesian Model Comparison and Averaging". to appear Artififical Intelligence and Law . The full published version can be read here.

**This particular puzzle is easy to 'resolve'. The 'non-probative' Bayes conclusion is only correct if we assume that the only people who could possibly have committed the crime are Fred and his twin brother. In practice we have to consider the possibility that neither committed the crime. While the eye witness evidence fails to distinguish between which of Fred and his twin was at the crime scene the evidence results in the probability that Fred was at the crime scene increasing in relation to the hypothesis that Fred was not at the crime scene.

Monday 4 March 2019

Bayesian networks for critical maintenance decisions on the railway network

An important recent paper (published in the Journal of Risk and Reliability) by Haoyuan Zhang and William Marsh of Queen Mary University of London presents a Bayesian network model that can be used for maintenance decision support that is especially relevant for rail safety. The model overcomes the practical limitations of previous statistical models that have attempted to maximise asset reliability cost-effectively, by scheduling maintenance based on the likely deterioration of an asset. The model extends an existing statistical model of asset deterioration, but shows how

data on the condition of assets available from their periodic inspection can be used
failure data from related groups of asset can be combined using judgement from experts
expert knowledge of the causes of deterioration can be combined with statistical data to adjust predictions.

The model (which was developed using the AgenaRisk software) is applied to a case study of bridges on the rail network in the UK.

A full pre-publication version is available here.

The full publication details for the paper are:

Zhang, H., & R Marsh, D. W. (2018). "Generic Bayesian network models for making maintenance decisions from available data and expert knowledge". Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability, 232(5), 505–523. https://doi.org/10.1177/1748006X17742765

Friday 25 January 2019

Magda Osman: fighting mainstream opinions on 'nudge' techniques and communicating uncertainty

Dr Magda Osman is our colleague at Queen Mary University of London who is a world leading expert on experimental psychology and especially the psychology of agency and control. She has questioned the effectiveness of 'nudge' persuasion techniques for improving individual and societal well-being. She is a co-PI in our project CAUSAL-DYNAMICS which is concerned with modelling dynamic decision-making from a causal perspective. We recently published a joint paper (which got a lot of publicity) describing our experiments testing how far people go in trusting experts. As Magda is currently also part-seconded to the Food Standards Agency she is often sought out for her views on the use of nudges in several food related policy issues. Two recent experiences suggest the scale of the difficulties in getting her message across. Last week she appeared near the end of the Channel 4 Programme 'How to lose weight well' (the full programme is here - Magda appears at 43 minutes)

That 30-second clip above (which might be blocked by Channel 4) was all that remained of an interview that lasted an hour. Magda says:

I was specifically invited on the program – the brief was that there was a very speculative technique now available via the NHS, that is claimed to use Nudge methods to help lose weight (because, the rationale is that it is designed to target people's unconscious processes subliminally – this is grossly inaccurate for reasons, first, because subliminal means below the threshold of conscious attention, but people’s attention is consciously directed to the messages being played to them via this NHS weight loss technique, and second, nudge has nothing to do with targeting the unconscious subliminally).

They were aware that I have critical views on nudge and on work to do with the unconscious and so they wanted an expert to discuss the issues and why there might be some reason to doubt the findings and the claims made by the NHS hypnotherapeutic technique which proposes that people don’t need to use any willpower to lose weight, their unconscious will do all the work because the technique will rewire their conscious thoughts.

I spent an hour being interviewed, and several of the questions concerned topics such as, ‘why is it that people say they have lost weight using this technique?’, and to speculate why it might be that people on the programme that would trial the technique might also lose weight, even if I’m suggesting that the method itself is unlikely to be effective because the evidence for it working is weak, and the theoretical basis for it is flawed? My answers to these questions were that there are statistical reasons for why it is that some people will show that they have lost weight as a result of the technique, but that has nothing to do with the technique itself. It is more to do with understanding random fluctuations in behaviour in samples that are tested. Also, the psychological factor is that, once people tell other people that they want to lose weight, and that they are going on a programme on national television (where they are filmed before and after the method), this places a high incentive on them to try to lose weight. Actually losing weight then may have nothing to do with the technique itself, but more to do with the willingness, motivation and commitment people will put in to do the mundane things that are absolutely necessary to lose weight, which is eat less fatty food, eat more healthily, and exercise more.

So, I spent an hour discussing these things, giving very clear and cogent reasons and examples (which they had specifically asked for) to demonstrate why it is that the method they were asking me to talk about is problematic, and should be considered with a huge degree of scepticism.

But what happened on the show was a set up. They filmed people motivated to take part in the trial of the method, they did not present many details about the method, or the patchy and problematic evidence base for it, then they bring me on and edit my interview so that I am shown to pooh-pooh the technique, and then they bring on people as testimonials of the technique’s success, and point out that 'the expert has got it wrong'

Obviously I didn’t get it wrong, because it was what I had predicted, but the piece was edited in a way to show that the value of one or two people’s experience is of equal or more weight than the value of 15 years worth of study in a field of work, which entails summarising thousands of data points.

This does nothing for helping people understand core issues to do with sampling, statistical inference, the value of a good causal understandings of evidence, the value of expertise, and the need for scepticism.

On top of that experience Magda and more of my colleagues were invited to submit a workshop to the International Conference on Uncertainty in Risk Analysis 2019, sponsored by the European Food Safety Authority (EFSA) and the German Federal Institute for Risk Assessment (BfR). Their workshop was one of five invited. Yet, the message of their workshop was not exactly what the EFSA wanted to hear about their safety standards. Consequently, Magda was told that, unlike the other four workshops, theirs would be relegated to a tiny side room that could only accommodate the workshop speakers - i.e. it was essentially no longer to be part of the programme. I'm not sure if it was deliberately to rub salt into wounds, but this is how Magda's workshop is currently advertised on the conference website...

Thursday 17 January 2019

Manhunt: the Levi Bellfield case from a probabilistic perspective

The ITV 3-part Series "Manhunt" starring Martin Clunes tells the story of the search for the killer of Amelie Delagrange who was murdered in Twickenham in 2004. It is based on the book by Colin Sutton (who was the detective in charge of the case) and dramatises his fight to find Amelie's killer, Levi Bellfield, who was also charged with the murder of Marsha McDonnell and three other attempted murders. The trial of Bellfield for these five crimes took place at the Old Bailey in 2008. He was convicted of the two murders and one of the three attempted murders (he was also later charged and convicted of the murder of Milly Dowler).

I declare a particular interest here because, between 2007-8, I (along with colleague Martin Neil) acted as an expert consultant to the Defence team in the case against Bellfield for the five crimes. We were initially asked to provide a statistical analysis relating to the number of car number plates that were consistent with the grainy CCTV image of a car at the scene of the McDonnell murder. We were subsequently asked to identify probabilistic issues relating to all aspects of the evidence, producing reports totalling several hundreds of pages. Although these reports are not public, some material we subsequently wrote that mentions the case can be found in this publication. Having watched the programme I think it is worth making the following points:

The CCTV image of the car at the scene of the McDonnell murder: none of the letters or numbers on the number plate were clearly visible. A number of image experts provided (contradictory) conclusions about which characters could be ruled out in each position, so there was much uncertainty about how many number plates needed to be investigated; additionally at least two of the experts had been subject to confirmation bias because - instead of being presented with the grainy CCTV image and asked to say what the number plate could be, they were shown Bellfield's actual number plate and asked if the image was a possible match (as a result our colleague Itiel Dror was co-opted as an expert witness in the area of confirmation bias). The prosecution claimed to have 'eliminated' all possible vehicles with 'matching' number plates other than Bellfield's. This was important because, if true, it represented the most solid piece of evidence against Bellfield in the entire case. However, taking account of the uncertainty of the image expert assertions, we concluded that potentially thousands of additional vehicles would need to be eliminated.

Lack of hard evidence: The dramatisation was correct in showing that, although there was much circumstantial evidence linking Bellfield to the murder of Delagrange, there was no direct evidence in the form of either forensic evidence or eyewitnesses to the crime. Hence, DCI Sutton's strategy was to link Bellfield to a number of other 'similar' crimes that had taken place within the same area. The programme focused on two of the four for which he was charged, namely the McDonnell murder and one other attempted murder, for which the programme used a made-up name "Sarah" (the credits make clear that some names were deliberately changed). The "Sarah" case actually refers to Kate Sheedy who was deliberately run over with a car. The other two cases of attempted murder (which I will refer to as R and D) were not covered. Again (as the dramatization suggested) there was no direct evidence linking Bellfield to either the McDonnell or "Sarah" attacks, but much circumstantial evidence. By providing circumstantial evidence linking Bellfield to five crimes which were claimed to be 'very similar', DCI Sutton was able to ensure that Bellfield was charged with the Delagrange murder.

Linking of the five 'very similar' crimes: This linkage became the thrust of the prosecution case against Bellfield. What the prosecution essentially argued was that the crimes were so similar, and the circumstantial evidence against Bellfield so compelling in each case, that (in the words of the prosecuting barrister) "the chances that these offences were committed by anyone other than Bellfield are so fanciful that you can reject them". But in reality there was no great 'similarity' between the crimes: even in the dramatization DCI Sutton states somewhat ironically (about the Amelie, Marsha, and "Sarah" attacks) that "they all involved striking the victim with a blunt instrument - as we can consider a car a blunt instrument". Much of the defence case was based around exposing the probabilistic and logical fallacies arising from assumptions of similarity (although interestingly it was much later that we formalized some of these issues). With regards to the whole issue of 'cross admissibility' in one report I wrote the following generic statement:

The cross admissibility argument is based on the following valid probabilistic reasoning:

· Suppose Crime A and Crime B are so similar that it there is a very high probability they have been committed by the same person.

· If there is evidence to support the hypothesis that the defendant is guilty of Crime A then this automatically significantly increases the probability of him being guilty of Crime B, even without any evidence of Crime B.

48. In other words what is happening here is that the probability of guilt in Crime A, together with the evidence of similarity between the two crimes, makes it allowable to conclude that the probability of guilt in crime B has increased. This is indeed provably correct, but what the prosecution claims is something subtly different, namely:

It is perfectly allowable to use the probability of guilt in Crime A, as evidence for Crime B.

49. This subtle difference leads to a fallacy in the following scenario that is relevant to this case.

· Suppose that there are three Crime A, B and C. Suppose that the evidence that crimes B and C are similar is strong. Then as above, any evidence that indicates guilt in the case of crime B will, because of the evidence of similarity, impact on the probability of guilt for crime C. However, suppose that we have not yet heard any evidence on crimes B and C and suppose that there is no evidence that Crime A is similar to either Crime B or C.

· If there is strong evidence supporting probability of guilt in crime A, then, contrary to the prosecution claim, this evidence does not impact on the probability of guilt for either crimes B or C and hence should not be used as evidence as suggested in point 48 above.

· In fact in this scenario the evidence concerning crime A should, in relation to crimes B and C, be treated just the same as ‘previous conviction’ information in normal trials.

50. Given that the judge has allowed ‘cross admissibility’ of all 5 cases the danger identified in point 49 presents an opportunity for strategic exploitation by the prosecution. Specifically, the opportunistic strategy is to focus on an offence in which there is most hard evidence, even if that is the least serious offence and even if it bears the least similarity to the others. The prosecution can then argue that evidence of guilt in that case can be taken as evidence of guilt in the more serious cases. The jury would not necessarily be aware of the underlying fallacy.

With hindsight point 50 is especially pertinent because, in contrast to the DeLagrane, McDonnell and "Sarah" cases, there actually was some direct evidence linking Bellfield to the R and D attacks (neither of which resulted in serious injury to the victims) and there were few similarities between these and the other three cases. The jury were allowed by the cross admissibility ruling and (in my view the incorrect) assumption of similarity to use evidence in the R and D attacks as evidence in the other cases. Interestingly, the Jury did not find Bellfield guilty of either of the R or D attacks.

Multiple probabilistic fallacies: In one of my summary reports I said (about the prosecution case generally): "There are several important instances of well known probabilistic fallacies (and also well known logical fallacies) that consistently exaggerate the impact of the evidence in favour of the prosecution case". In addition to the cross admissibility 'fallacy' we found examples of the following in the prosecution opening statement:

Prosecutors fallacy
Base rate neglect fallacy
Dependent evidence fallacy
Logically dependent evidence fallacy
Conjunction fallacy
Confirmation bias fallacy
Previous convictions fallacy
Coincidence fallacy
Minimal utility evidence fallacy
Lack of hard evidence fallacy
“Crimewatch UK” fallacy

These fallacies are all covered in our book and some (in the context of the Bellfield case) are covered in this paper.

And finally: In one scene in the programme DCI Sutton pointed out that he, Bellfield and Bellfield's lawyer all had one thing in common - being Spurs fans. Count me in on that one too...

Links

Monday 14 January 2019

New research published in IEEE Transactions makes building accurate Bayesian networks easier

(This is an update of a previous posting)
One of the biggest practical challenges in building Bayesian network (BN) models for decision support and risk assessment is to define the probability tables for nodes with multiple parents. Consider the following example:

In any given week a terrorist organisation may or may not carry out an attack. There are several independent cells in this organisation for which it may be possible in any week to determine heightened activity. If it is known that there is no heightened activity in any of the cells, then an attack is unlikely. However, for any cell if it is known there is heightened activity then there is a chance an attack will take place. The more cells known to have heightened activity the more likely an attack is.

In the case where there are three terrorist cells, it seems reasonable to assume the BN structure here:

To define the probability table for the node "Attack carried out" we have to define probability values for each possible combination of the states of the parent nodes, i.e., for all the entries of the following table.

That is 16 values (although, since the columns must sum to one we only really have to define 8).
When data are sparse - as in examples like this - we must rely on judgment from domain experts to elicit these values. Even for a very small example like this, such elicitation is known to be highly error-prone. When there are more parents (imagine there are 20 different terrorist cells) or more states other than "False" and "True", then it becomes practically infeasible. Numerous methods have been proposed to simplify the problem of eliciting such probability tables. One of the most popular methods - “noisy-OR”- approximates the required relationship in many real-world situations like the above example. BN tools like AgenaRisk implement the noisy-OR function making it easy to define even very large probability tables. However, it turns out that in situations where the child node (in the example this is the node "Attack carried out") is observed to be "False", the noisy-OR function fails to properly capture the real world implications. It is this weakness that is both clarified and resolved in the following two new papers published in IEEE Transactions on Knowledge and Data Engineering (both are open access so you can download the full pdf).

The first paper shows that by changing a single column of the probability table generated from the noisy-OR function (namely the last column where all parents are "True") most (but not all) of the deficiencies in noisy-OR are resolved.The second paper shows how the problem is resolved by defining the nodes as 'ranked nodes' and using the weighted average function in AgenaRisk.

Hence, while the first paper provides a simple approximate solutio, the second provides a 'complete solution' but requires software like AgenaRisk for its implementation,

Acknowledgements: The research was supported by the European Research Council under project, ERC-2013-AdG339182 (BAYES_KNOWLEDGE); the Leverhulme Trust under Grant RPG-2016-118 CAUSAL-DYNAMICS; Intelligence Advanced Research Projects Activity (IARPA), to the BARD project (Bayesian Reasoning via Delphi) of the CREATE programme under Contract [2017-16122000003]. and Agena Ltd for software support. We also acknowledge the helpful recommendations and comments of Judea Pearl, and the valuable contributions of David Lagnado (UCL) and Nicole Cruz (Birkbeck).

Wednesday 2 January 2019

New paper shows how and why important evidence is ignored in medicine, forensics and the law

Consider the following problem:

There is a diagnostic screening test for a particular serious disease which has a 90% chance of testing positive if the patient has the disease. However, this test also has a 90% chance of testing positive for a common benign condition. As the test cannot distinguish between whether or not the person has the serious or benign condition, can we disregard the evidence of the positive test result?

An important new paper by Toby Pilditch and colleagues (at UCL and Queen Mary) published today in the journal Psychological Science demonstrates that people assume that such evidence can be disregarded. Specifically, they assume that - as it is equally predicted by two competing hypotheses (in this case serious disease versus benign) - it offers no support for either hypothesis. However, this assumption is wrong. It only holds when the 'competing' hypotheses are mutually exclusive and exhaustive (i.e. exactly one is true). In the above example, if both the serious disease and the benign condition are equally likely (say, a 5% chance) in a random member of the population then the positive test result increases the probability of BOTH the serious disease and the benign condition to about 25% (assuming a 10% false positive rate for the test). The paper shows that this reasoning error is due to a 'zero-sum' perspective on evidence, wherein people wrongly assume that evidence which supports one causal hypothesis must disconfirm its competitor. Across three experiments the paper demonstrates this error is robust to intervention and generalizes across several different contexts. The paper also rules out several alternative explanations of the bias.

The implications of this work are profound, as the fallacy is made in many critical areas of decision-making including law and forensics as well as medicine. For example, in 2001 Barry George was convicted of the shooting of Jill Dando, a TV celebrity, outside her flat in broad daylight. The main evidence against him was a single particle of firearm discharge residue (FDR) found in his coat pocket. In 2007 the Appeal Court concluded that the FDR evidence was not ‘probative’ in favour of guilt, because, contrary to what had been suggested in the original trial, it was equally likely to have arisen due to poor police procedures (such as the coat being exposed to FDR during police handling) as from him having fired the gun that killed Dando. Hence, his conviction was quashed and a re-trial ordered, in which Barry George was set free. However, the appeal court argument assumed that if a piece of evidence (the FDR in the coat pocket) is equally probable under two alternative hypotheses (Barry George fired gun vs poor police handling of evidence) then it cannot support either of these hypotheses. But it is not necessarily the case that exactly one of these two hypotheses is true; it is possible that Barry George fired the gun and there was poor police handling of the evidence; and also that neither were true (e.g., the FDR particle came from elsewhere). Therefore, rather than being neutral, the FDR evidence may have been probative against Barry George (albeit weakly). The FDR evidence does not discriminate ‘Barry George fired the gun’ versus ‘poor police handling of evidence’, but it does discriminate ‘Barry George fired the gun’ from ‘Barry George did not fire the gun’: it is the latter hypothesis pair that was the target in this criminal investigation.

I have personally been involved in cases where defence evidence has also been wrongly deemed irrelevant because of the zero-sum fallacy. In particular, this happens when DNA from a crime scene does NOT match the defendant. The defence lawyer argues that this supports the hypothesis that the defendant was not at the crime scene. However, the prosecution and forensic experts argue (wrongly) that the lack of a match can be disregarded as this is equally likely to be the result of failure to collect a sufficient relevant sample of DNA from the crime scene.

The research was based upon work undertaken in the BARD project which was concerned with improving intelligence analysis with uncertain evidence using Bayesian networks. It was supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), under Contract [2017-16122000003].

The full reference:

Pilditch, T., Fenton, N. E., & Lagnado, D. A. (2019). "The zero-sum fallacy in evidence evaluation". Psychological Science, http://doi.org/10.1177/0956797618818484

Pdf of the accepted version.