Saturday, 16 March 2019

Hannah Fry’s “Hello World” and the Example of Algorithm Bias




“Hello World” is an excellent book by Hannah Fry that provides lay explanations about both the potential and threats of AI and machine learning algorithms in the modern world. It is filled with many excellent examples, and one that is especially important is in Chapter 3 (“Justice”) about the use of algorithms in the criminal justice system. The example demonstrates the extremely important point that there is an inevitable trade-off between ‘accuracy’ and ‘fairness’ when it comes to algorithms that make decisions about people.

While the overall thrust and conclusions of the example are correct the need to keep any detailed maths out of the book might leave careful readers unconvinced about whether the example really demonstrates the stated conclusions. I feel it is important to get the details right because the issue of algorithmic fairness is of increasing importance for the future of AI, yet is widely misunderstood.

I have therefore produced a short report that provides a fully worked explanation of the example. I explain what is missing from Hannah's presentation, namely any explicit calculation of the false positive rates of the algorithm. I show how Bayes theorem (and some other assumptions) are needed to compute the false positive rates for men and women. I also show why and how a causal model of the problem (namely a Bayesian network model) makes everything much clearer.

Fry, H. (2018). "Hello world : how to be human in the age of the machine". New York: W. W. Norton & Company, Inc. 

My report:
 Fenton, N E. (2019)  "Hannah Fry’s 'Hello World' and the Example of Algorithm Bias", DOI 10.13140/RG.2.2.14339.55844
A pdf of the report is also available here
See also:


Thursday, 14 March 2019

The Simonshaven murder case modelled as a Bayesian network


A paper published today in Topics in Cognitive Science is one in a series of analyses of a Dutch murder case, each using a different modelling approach. In this case a woman was murdered while out walking with her husband in a quiet recreational area near the village of Simonshaven, close to Rotterdam, in 2011. The trial court of Rotterdam convicted the victim’s husband of murder by intentionally hitting and/or kicking her in the head and strangling her. For the appeal the defence provided new evidence about other ‘similar’ murders in the area committed by a different person.

The idea to use this case to evaluate a number of different methods for modelling complex legal cases was originally proposed by Floris Bex (Utrecht), Anne Ruth Mackor (Groningen) and Henry Prakken (Utrecht). In September 2016 -as part of our Programme Probability and Statistics in Forensic Science at the Isaac Newton Institute Cambridge - a special two-day workshop was arranged in which different teams were presented with the Simonshaven evidence and had to produce a model analysis. At the time the Appeal was still to be heard. In a follow-up workshop to review the various solutions (held in London in June 2017 as part of the BAYES-KNOWLEDGE project) the participants agreed to publish their results in a special issue of a journal.

This paper describes the Bayesian Network (BN) team's solution. One of the key aims was to  determine if a useful BN could be quickly constructed using the previously established idioms-based approach (this provides a generic method for translating legal cases into BNs). The BN model described was built by the authors during the course of the workshop. The total effort involved was approximately 26 hours (i.e. an average of 6 hours per author). With the basic assumptions described in the paper, the posterior probability of guilt once all the evidence is entered is 74%. The paper describes a formal evaluation of the model, using sensitivity analysis, to determine how robust the model conclusions are to key subjective prior probabilities over a full range of what may be deemed ‘reasonable’ from both defence and prosecution perspectives. The results show that the model is reasonably robust - pointing generally to a reasonably high posterior probability of guilt, but also generally below the 95% threshold expected in criminal law.

The authors acknowledge the insights of  the following workshop participants: Floris Bex, Christian Dahlman, Richard Gill, Anne Ruth Mackor, Ronald Meester, Henry Prakken, Leila Schneps, Marjan Sjerps, Nadine Smit, Bart Verheij, and Jacob de Zoete.


Full reference:
Fenton, N. E., Neil, M., Yet, B., & Lagnado, D. A. (2019). "Analyzing the Simonshaven Case using Bayesian Networks". Topics in Cognitive Science, 10.1111/tops.12417.  For those without a subscription to the journal, the published version can be read here: https://rdcu.be/bqYxp)  

See also:

Monday, 11 March 2019

Challenging claims that probability theory is incompatible with legal reasoning

 
A new paper published in Science and Justice exposes why common claims that probability theory is incompatible with the law are flawed.

One of the most effective tactics that has been used by legal scholars to 'demonstrate' the 'limitations' and 'incompatibility' of probability theory (and particularly Bayes theorem) with legal reasoning is the use of puzzles like the following:

Fred is charged with a crime. A reliable eye witness testifies that someone exactly matching Fred’s appearance was seen fleeing the crime scene. But Fred is known to have an identical twin brother. So is the evidence relevant?"**
The argument to suggest that this example demonstrates probability theory is incompatible with legal norms goes something like this:
Both intuitively and legally it is clear that the evidence should be considered relevant. But according to probability theory (Bayes' theorem), the evidence has 'no probative value' since it provides no change in our belief about whether Fred is more likely than his twin brother to have been at the crime scene. Hence, according to probability theory the evidence is wrongly considered inadmissible.
Specifically, such problems are intended to show that use of probability theory results in legal paradoxes. As such, these problems have been a powerful detriment to the use of probability theory  in the law.

The new paper  shows that all of these puzzles only lead to ‘paradoxes’ under an artificially constrained view of probability theory and the use of the so-called likelihood ratio, in which multiple related hypotheses and pieces of evidence are squeezed into a single hypothesis variable and a single evidence variable. When the distinct relevant hypotheses and evidence are described properly in a causal model (a Bayesian network), the paradoxes vanish. Moreover, the resulting Bayesian networks provide a powerful framework for legal reasoning.

Full reference details of the paper:
de Zoete, J., Fenton, N. E., Noguchi, T., & Lagnado, D. A. (2019). "Countering the ‘probabilistic paradoxes in legal reasoning’ with Bayesian networks". Science & Justice 10.1016/j.scijus.2019.03.003
The pre-publication version (pdf)
The models (which can be run using AgenaRisk)

Two other papers just accepted (details to follow) also demonstrate the power of Bayesian networks in legal reasoning:
Fenton, N. E., Neil, M., Yet, B., & Lagnado, D. A. (2019). "Analyzing the Simonshaven Case using Bayesian Networks". Topics in Cognitive Science, 10.1111/tops.12417.  (Update: this had now been published; the published version can be read https://rdcu.be/bqYxp )
Neil, M., Fenton, N. E., Lagnado, D. A. & Gill, R. (2019), "Modelling competing legal arguments using Bayesian Model Comparison and Averaging". to appear Artififical Intelligence and Law  . Pre-publication version (pdf) 

**This particular puzzle is easy to 'resolve'. The 'non-probative' Bayes conclusion is only correct if we assume that the only people who could possibly have committed the crime are Fred and his twin brother. In practice we have to consider the possibility that neither committed the crime. While the eye witness evidence fails to distinguish between which of Fred and his twin was at the crime scene the evidence results in the probability that Fred was at the crime scene increasing in relation to the hypothesis that Fred was not at the crime scene

Monday, 4 March 2019

Bayesian networks for critical maintenance decisions on the railway network




An important recent paper (published in the Journal of Risk and Reliability) by Haoyuan Zhang and William Marsh of Queen Mary University of London presents a Bayesian network model that can be used for maintenance decision support that is especially relevant for rail safety. The model overcomes the practical limitations of previous statistical models that have attempted to maximise asset reliability cost-effectively, by scheduling maintenance based on the likely deterioration of an asset. The model extends an existing statistical model of asset deterioration, but shows how
  1. data on the condition of assets available from their periodic inspection can be used 
  2. failure data from related groups of asset can be combined using judgement from experts 
  3. expert knowledge of the causes of deterioration can be combined with statistical data to adjust predictions. 
The model (which was developed using the AgenaRisk software) is applied to a case study of bridges on the rail network in the UK.

A full pre-publication version is available here.

The full publication details for the paper are:
Zhang, H., & R Marsh, D. W. (2018). "Generic Bayesian network models for making maintenance decisions from available data and expert knowledge". Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability, 232(5), 505–523. https://doi.org/10.1177/1748006X17742765



Friday, 25 January 2019

Magda Osman: fighting mainstream opinions on 'nudge' techniques and communicating uncertainty

Dr Magda Osman is our colleague at Queen Mary University of London who is a world leading expert on experimental psychology and especially the psychology of agency and control. She has questioned the effectiveness of 'nudge' persuasion techniques for improving individual and societal well-being. She is a co-PI in our project CAUSAL-DYNAMICS which is concerned with modelling dynamic decision-making from a causal perspective. We recently published a joint paper (which got a lot of publicity) describing our experiments testing how far people go in trusting experts. As Magda is currently also part-seconded to the Food Standards Agency she is often sought out for her views on the use of nudges in several food related policy issues. Two recent experiences suggest the scale of the difficulties in getting her message across. Last week she appeared near the end of the Channel 4 Programme 'How to lose weight well' (the full programme is here - Magda appears at 43 minutes)


That 30-second clip above (which might be blocked by Channel 4) was all that remained of an interview that lasted an hour. Magda says:
I was specifically invited on the program – the brief was that there was a very speculative technique now available via the NHS, that is claimed to use Nudge methods to help lose weight (because, the rationale is that it is designed to target people's unconscious processes subliminally – this is grossly inaccurate for reasons, first, because subliminal means below the threshold of conscious attention, but people’s attention is consciously directed to the messages being played to them via this NHS weight loss technique, and second, nudge has nothing to do with targeting the unconscious subliminally).

They were aware that I have critical views on nudge and on work to do with the unconscious and so they wanted an expert to discuss the issues and why there might be some reason to doubt the findings and the claims made by the NHS hypnotherapeutic technique which proposes that people don’t need to use any willpower to lose weight, their unconscious will do all the work because the technique will rewire their conscious thoughts.

I spent an hour being interviewed, and several of the questions concerned topics such as, ‘why is it that people say they have lost weight using this technique?’, and to speculate why it might be that people on the programme that would trial the technique might also lose weight, even if I’m suggesting that the method itself is unlikely to be effective because the evidence for it working is weak, and the theoretical basis for it is flawed? My answers to these questions were that there are statistical reasons for why it is that some people will show that they have lost weight as a result of the technique, but that has nothing to do with the technique itself. It is more to do with understanding random fluctuations in behaviour in samples that are tested. Also, the psychological factor is that, once people tell other people that they want to lose weight, and that they are going on a programme on national television (where they are filmed before and after the method), this places a high incentive on them to try to lose weight. Actually losing weight then may have nothing to do with the technique itself, but more to do with the willingness, motivation and commitment people will put in to do the mundane things that are absolutely necessary to lose weight, which is eat less fatty food, eat more healthily, and exercise more.
So, I spent an hour discussing these things, giving very clear and cogent reasons and examples (which they had specifically asked for) to demonstrate why it is that the method they were asking me to talk about is problematic, and should be considered with a huge degree of scepticism.
But what happened on the show was a set up. They filmed people motivated to take part in the trial of the method, they did not present many details about the method, or the patchy and problematic evidence base for it, then they bring me on and edit my interview so that I am shown to pooh-pooh the technique, and then they bring on people as testimonials of the technique’s success, and point out that 'the expert has got it wrong'
Obviously I didn’t get it wrong, because it was what I had predicted, but the piece was edited in a way to show that the value of one or two people’s experience is of equal or more weight than the value of 15 years worth of study in a field of work, which entails summarising thousands of data points.
This does nothing for helping people understand core issues to do with sampling, statistical inference, the value of a good causal understandings of evidence, the value of expertise, and the need for scepticism.

On top of that experience Magda and more of my colleagues were invited to submit a workshop to the International Conference on Uncertainty in Risk Analysis 2019, sponsored by the European Food Safety Authority (EFSA) and the German Federal Institute for Risk Assessment (BfR). Their workshop was one of five invited. Yet, the message of their workshop was not exactly what the EFSA wanted to hear about their safety standards. Consequently, Magda was told that, unlike the other four workshops, theirs would be relegated to a tiny side room that could only accommodate the workshop speakers - i.e.  it was essentially no longer to be part of the programme. I'm not sure if it was deliberately to rub salt into wounds, but this is how Magda's workshop is currently advertised on the conference website...



Thursday, 17 January 2019

Manhunt: the Levi Bellfield case from a probabilistic perspective


The ITV 3-part Series "Manhunt" starring Martin Clunes tells the story of the search for the killer of Amelie Delagrange who was murdered in Twickenham in 2004. It is based on the book by Colin Sutton (who was the detective in charge of the case) and dramatises his fight to find Amelie's killer, Levi Bellfield, who was also charged with the murder of Marsha McDonnell and three other attempted murders. The trial of Bellfield for these five crimes took place at the Old Bailey in 2008. He was convicted of the two murders and one of the three attempted murders (he was also later charged and convicted of the murder of Milly Dowler).

I declare a particular interest here because, between 2007-8, I (along with colleague Martin Neil) acted as an expert consultant to the Defence team in the case against Bellfield for the five crimes. We were initially asked to provide a statistical analysis relating to the number of car number plates that were consistent with the grainy CCTV image of a car at the scene of the McDonnell murder. We were subsequently asked to identify probabilistic issues relating to all aspects of the evidence,  producing reports totalling several hundreds of pages.  Although these reports are not public, some material we subsequently wrote that mentions the case can be found in this publication. Having watched the programme I think it is worth making the following points:

The CCTV image of the car at the scene of the McDonnell murder: none of the letters or numbers on the number plate were clearly visible. A number of image experts provided (contradictory) conclusions about which characters could be ruled out in each position, so there was much uncertainty about how many number plates needed to be investigated; additionally at least two of the experts had been subject to confirmation bias because - instead of being presented with the grainy CCTV image and asked to say what the number plate could be, they were shown Bellfield's actual number plate and asked if the image was a possible match (as a result our colleague Itiel Dror was co-opted as an expert witness in the area of confirmation bias). The prosecution claimed to have 'eliminated' all possible vehicles with 'matching' number plates other than Bellfield's. This was important because, if true, it represented the most solid piece of evidence against Bellfield in the entire case. However, taking account of the uncertainty of the image expert assertions, we concluded that potentially thousands of additional vehicles would need to be eliminated. 

Lack of hard evidence: The dramatisation was correct in showing that, although there was much circumstantial evidence linking Bellfield to the murder of Delagrange, there was no direct evidence in the form of either forensic evidence or eyewitnesses to the crime. Hence, DCI Sutton's strategy was to link Bellfield to a number of  other 'similar' crimes that had taken place within the same area. The programme focused on two of the four for which he was charged, namely the McDonnell murder and one other attempted murder, for which the programme used a made-up name "Sarah" (the credits make clear that some names were deliberately changed). The "Sarah" case actually refers to Kate Sheedy who was deliberately run over with a car. The other two cases of attempted murder (which I will refer to as R and D) were not covered. Again (as the dramatization suggested) there was no direct evidence linking Bellfield to either the McDonnell or "Sarah" attacks, but much circumstantial evidence. By providing circumstantial evidence linking Bellfield to five crimes which were claimed to be 'very similar', DCI Sutton was able to ensure that Bellfield was charged with the Delagrange murder.

Linking of the five 'very similar' crimes: This linkage became the thrust of the prosecution case against Bellfield. What the prosecution essentially argued was that the crimes were so similar, and the circumstantial evidence against Bellfield so compelling in each case, that (in the words of the prosecuting barrister) "the chances that these offences were committed by anyone other than Bellfield are so fanciful that you can reject them". But in reality there was no great 'similarity' between the crimes: even in the dramatization DCI Sutton states somewhat ironically (about the Amelie, Marsha, and "Sarah" attacks) that "they all involved striking the victim with a blunt instrument - as we can consider a car a blunt instrument". Much of the defence case was based around exposing the probabilistic and logical fallacies arising from assumptions of similarity (although interestingly it was much later that we formalized some of these issues). With regards to the whole issue of 'cross admissibility' in one report I wrote the following generic statement:
The cross admissibility argument is based on the following valid probabilistic reasoning:

· Suppose Crime A and Crime B are so similar that it there is a very high probability they have been committed by the same person.

· If there is evidence to support the hypothesis that the defendant is guilty of Crime A then this automatically significantly increases the probability of him being guilty of Crime B, even without any evidence of Crime B.

48. In other words what is happening here is that the probability of guilt in Crime A, together with the evidence of similarity between the two crimes, makes it allowable to conclude that the probability of guilt in crime B has increased. This is indeed provably correct, but what the prosecution claims is something subtly different, namely:

It is perfectly allowable to use the probability of guilt in Crime A, as evidence for Crime B.

49. This subtle difference leads to a fallacy in the following scenario that is relevant to this case.

· Suppose that there are three Crime A, B and C. Suppose that the evidence that crimes B and C are similar is strong. Then as above, any evidence that indicates guilt in the case of crime B will, because of the evidence of similarity, impact on the probability of guilt for crime C. However, suppose that we have not yet heard any evidence on crimes B and C and suppose that there is no evidence that Crime A is similar to either Crime B or C.

· If there is strong evidence supporting probability of guilt in crime A, then, contrary to the prosecution claim, this evidence does not impact on the probability of guilt for either crimes B or C and hence should not be used as evidence as suggested in point 48 above.

· In fact in this scenario the evidence concerning crime A should, in relation to crimes B and C, be treated just the same as ‘previous conviction’ information in normal trials.

50. Given that the judge has allowed ‘cross admissibility’ of all 5 cases the danger identified in point 49 presents an opportunity for strategic exploitation by the prosecution. Specifically, the opportunistic strategy is to focus on an offence in which there is most hard evidence, even if that is the least serious offence and even if it bears the least similarity to the others. The prosecution can then argue that evidence of guilt in that case can be taken as evidence of guilt in the more serious cases. The jury would not necessarily be aware of the underlying fallacy. 
With hindsight point 50 is especially pertinent because, in contrast to the DeLagrane, McDonnell and "Sarah" cases, there actually was some direct evidence linking Bellfield to the R and D attacks  (neither of which resulted in serious injury to the victims) and there were few similarities between these and the other three cases.  The jury were allowed by the cross admissibility ruling and (in my view the incorrect) assumption of similarity to use evidence in the R and D attacks as evidence in the other cases. Interestingly, the Jury did not find Bellfield guilty of either of the R or D attacks.

Multiple probabilistic fallacies: In one of my summary reports I said (about the prosecution case generally): "There are several important instances of well known probabilistic fallacies (and also well known logical fallacies) that consistently exaggerate the impact of the evidence in favour of the prosecution case". In addition to the cross admissibility 'fallacy' we found examples of the following in the prosecution opening statement:
  • Prosecutors fallacy
  • Base rate neglect fallacy
  • Dependent evidence fallacy
  • Logically dependent evidence fallacy
  • Conjunction fallacy
  • Confirmation bias fallacy
  • Previous convictions fallacy
  • Coincidence fallacy
  • Minimal utility evidence fallacy
  • Lack of hard evidence fallacy
  • “Crimewatch UK” fallacy
These fallacies are all covered in our book and some (in the context of the Bellfield case) are covered in this paper.

And finally: In one scene in the programme DCI Sutton pointed out that he, Bellfield and Bellfield's lawyer all had one thing in common - being Spurs fans. Count me in on that one too...


Links




Monday, 14 January 2019

New research published in IEEE Transactions makes building accurate Bayesian networks easier

(This is an update of a previous posting)
One of the biggest practical challenges in building Bayesian network (BN) models for decision support and risk assessment is to define the probability tables for nodes with multiple parents. Consider the following example:
In any given week a terrorist organisation may or may not carry out an attack. There are several independent cells in this organisation for which it may be possible in any week to determine heightened activity. If it is known that there is no heightened activity in any of the cells, then an attack is unlikely. However, for any cell if it is known there is heightened activity then there is a chance an attack will take place. The more cells known to have heightened activity the more likely an attack is.
In the case where there are three terrorist cells, it seems reasonable to assume the BN structure here:

To define the probability table for the node "Attack carried out" we have to define probability values for each possible combination of the states of the parent nodes, i.e., for all the entries of the following table.


That is 16 values (although, since the columns must sum to one we only really have to define 8).
When data are sparse - as in examples like this - we must rely on judgment from domain experts to elicit these values. Even for a very small example like this, such elicitation is known to be highly error-prone. When there are more parents (imagine there are 20 different terrorist cells) or more states other than "False" and "True", then it becomes practically infeasible.  Numerous methods have been proposed to simplify the problem of eliciting such probability tables. One of the most popular methods - “noisy-OR”- approximates the required relationship in many real-world situations like the above example. BN tools like AgenaRisk implement the noisy-OR function making it easy to define even very large probability tables. However, it turns out that in situations where the child node (in the example this is the node "Attack carried out") is observed to be "False", the noisy-OR function fails to properly capture the real world implications. It is this weakness that is both clarified and resolved in the following two new papers published in IEEE Transactions on Knowledge and Data Engineering (both are open access so you can download the full pdf).

The first paper shows that by changing a single column of the probability table generated from the noisy-OR function (namely the last column where all parents are "True") most (but not all) of the deficiencies in noisy-OR are resolved.The second paper shows how the problem is resolved by defining the nodes as 'ranked nodes' and using the weighted average function in AgenaRisk.

Hence, while the first paper provides a simple approximate solutio, the second provides a 'complete solution' but requires software like AgenaRisk for its implementation,

Acknowledgements: The research was supported by the European Research Council under project, ERC-2013-AdG339182 (BAYES_KNOWLEDGE); the Leverhulme Trust under Grant RPG-2016-118 CAUSAL-DYNAMICS; Intelligence Advanced Research Projects Activity (IARPA), to the BARD project (Bayesian Reasoning via Delphi) of the CREATE programme under Contract [2017-16122000003]. and Agena Ltd for software support. We also acknowledge the helpful recommendations and comments of Judea Pearl, and the valuable contributions of David Lagnado (UCL) and Nicole Cruz (Birkbeck).