Tuesday, 17 June 2014

Proving referee bias with Bayesian networks

An article in today's Huffington Post by Raj Persaud and Adrian Furnham talks about the scientific evidence that supports the idea of referee bias in football. One of the studies they describe is the recent work I did with Anthony Constantinou and Liam Pollock** where we developed a causal Bayesian network model to determine referee bias and applied it to the data from all matches played in the 2011-12 Premier League season. Here is what they say about our study:
Another recent study might just have scientifically confirmed this possible 'Ferguson Factor', entitled, 'Bayesian networks for unbiased assessment of referee bias in Association Football'. The term 'Bayesian networks', refers to a particular statistical technique deployed in this research, which mathematically analysed referee bias with respect to fouls and penalty kicks awarded during the 2011-12 English Premier League season.
The authors of the study, Anthony Constantinou, Norman Fenton and Liam Pollock found fairly strong referee bias, based on penalty kicks awarded, in favour of certain teams when playing at home.
Specifically, the two teams (Manchester City and Manchester United) who finished first and second in the league, appear to have benefited from bias that cannot be explained by other factors. For example a team may be awarded more penalties simply because it's more attacking, not just because referees are biased in its favour.

The authors from Queen Mary University of London, argue that if the home team is more in control of the ball, then, compared to opponents, it's bound to be awarded more penalties, with less yellow and red cards, compared to opponents. Greater possession leads any team being on the receiving end of more tackles. A higher proportion of these tackles are bound to be committed nearer to the opponent's goal, as greater possession also usually results in territorial advantage.
However, this study, published in the academic journal 'Psychology of Sport and Exercise', found, even allowing for these other possible factors, Manchester United with 9 penalties awarded during that season, was ranked 1st in positive referee bias, while Manchester City with 8 penalties awarded is ranked 2nd. In other words it looks like certain teams (most specifically Manchester United) benefited from referee bias in their favour during Home games, which cannot be explained by any other possible element of 'Home Advantage'. 
What makes this result particularly interesting, the authors argue, is that for most of the season, these were the only two teams fighting for the English Premiere League title. Were referees influenced by this, and it impacted on their decision-making?  Conversely the study found Arsenal, a team of similar popularity and wealth, and who finished third, benefited least of all 20 teams from referee bias at home, with respect to penalty kicks awarded. With the second largest average attendance as well as the second largest average crowd density, Arsenal were still ranked last in terms of referee bias favouring them for penalties awarded. In other words, Arsenal didn't seem to benefit much at all from the kind of referee bias that other teams were gaining from 'Home Advantage'. Psychologists might argue that temperament-wise, Sir Alex Ferguson and Arsene Wenger appear at opposite poles of the spectrum.
**  Constantinou, A. C., Fenton, N. E., & Pollock, L. (2014). "Bayesian networks for unbiased assessment of referee bias in Association Football". To appear in Psychology of Sport & Exercise. A pre-publication draft can be found here.

Our related work on using Bayesian networks to predict football results is discussed here.

Saturday, 14 June 2014

Daniel Kahneman at the Hebrew University Jerusalem

I have just returned from the workshop on "Behavioral Legal Studies - Cognition, Motivation, and Moral Judgment" at the Hebrew University in Jerusalem, Israel. I was especially interested in seeing Daniel Kahneman open the workshop with "Reflections on Psychology, Economics, and Law". Kahneman won the 2002 nobel prize in economics and was also recipient of the Presidential Medal of Freedom from President Obama in 2013.

Kahneman (centre) interviewed by Prof Zamir (left) and Prof Ritov (right)
Kahneman is, of course, very well known for his pioneering work with Amos Tversky and Paul Slovic (who also spoke at the workshop) on cognitive bias (which has greatly influenced our own work on probabilistic reasoning in the law) and also prospect theory (for which he won the Nobel prize). Kahneman's 2011 book "Thinking, Fast and Slow" which summarises much of his work, has sold over one and a half million copies. The book is based on the idea that, when it comes to assessment and decision-making, people are either system 1 (fast) thinkers or system 2 (slow) thinkers. The former act on instinct and often get things wrong while the latter are more likely to get things right because they think through all aspects of a problem carefully. While I think Kahneman's book is a very good read, I personally do not find the fast/slow classification of decision-makers to be especially helpful. Nevertheless, a lot of the speakers at the workshop used it to inform their own work.

Kahneman's presentation was in the form of an interview by Prof. Eyal Zamir and Prof. Ilana Ritov (both of the Law Faculty at the Hebrew University) asking the questions. Kahneman nicely summarised the main results and achievements of his career and was humble enough both to give credit to his co-researchers and also to admit that some of his theories (such as on gambling choices) had subsequently been proven to be false.

Audience at Kahneman interview
Kahneman touched on one of the key points in his book that I find problematic, namely his rejection of what he calls 'complex algorithms';  his argument is that any assessment/decision problem that involves expert judgment should not involve many variables because you can always get just as good a result with a simple model inolving no more than three variables. While I agree that any problem solution should be kept as simple as possible, a crude limit to the number of variables directly contradicts our Bayesian network approach, where models often necessarily involve multiple variables and relationships derived from both data and expert judgment. The important point is that the 'complex algorithms' we use are just Bayesian inference - of course if you had to do this 'by hand' then it would be disastrous, but the fact that there are widely available tools means the algorithmic complexity is completely hidden.  Crucially, we have shown many times (see for example this work on evaluating forensic evidence) that the Bayesian network solution provides greater accuracy and insights than the commonly used simplistic 'solutions'. 
 
Much of the theme of what Kahneman spoke about (and which was also a key theme of the workshop generally) was about 'moral judgment' - he cited the radically different legal responses to murder and attempted murder as an example of irrational (and possibly immoral) decision-making. The problem with 'moral judgment' - and the continually repeated notion of  'what is good for society' is that most academics have a particular view about these that they assume are both 'correct' and universally held. Hence, much of what I heard during the workshop was politicized and biased. This was also evident in Kahneman's answers to audience questions following the interview. I actually asked Kahneman what his rationale was for concluding that President Obama was a system 2 thinker. Bearing in mind that system 2 thinkers are supposed to be 'good' decision makers compared with system 1 thinkers, his response was clearly popular with many in the audience, but actually surprised me because it seemed to be purely political;  he basically said something like "you only have to compare him with the previous guy (Bush) to know the difference".

Kahneman also gave his views on how conflicts (like that of Israel and its enemies) could be solved, which I found were naive and possibly contradictory to his own work in psychology. His theory is that both 'sides' in a conflict are rational, but believe they are responding to the actions of the other side - so all you need to do is to make both sides aware of this.

There was a very nice reception for invited workshop participants after Kahneman's interview, but Kahneman himself had to rush off to another meeting and he took no further part in the workshop. 


"
Dave Lagnado (UCL) - who we have worked with on Bayesian networks and the law - giving an excellent talk on "Spreading the blame" (he presented a framework for intuitive judgements and blame)
My trip was partially funded under ERC Grant number: 339182 (BAYES-KNOWLEDGE) and I gratefully acknowledge the ERC contribution. 

Wednesday, 30 April 2014

Statistics of Poverty

I was one of two plenary speakers at the Winchester Conference on Trust, Risk, Information and the Law yesterday (slides of my talk: "Improving Probability and Risk Assessment in the Law" are here).

The other plenary speaker was Matthew Reed (Chief Executive of the Children's Society) who spoke about "The role of trust and information in assessing risk and protecting the vulnerable". In his talk he made the very dramatic statement that
"one in every four children in the UK today lives in poverty"
He further said that the proportion had increased significantly over the last 25 years and showed no signs of improvement.

When questioned about the definition of child poverty he said he was using the Child Poverty Act 2010 definition which defines a child as living in poverty if they lived in a household whose income (which includes benefits) is less than 60% of the national median (see here).

Matthew Reed has a genuine and deep concern for the welfare of children. However, the definition is purely political and is as good an example of poor measurement and misuse of statistics as you can find. Imagine if every household was given an immediate income increase of 1000%  - this would mean the very poorest households with, say, a single unemployed parent and 2 children going from £18,000 to a fabulously wealthy £180,000 per year. Despite this, one in every four children would still be 'living in poverty' because the number of households whose income is less than 60% of the median has not changed.  If the median before was £35,000, then it is now £350,000 and everybody earning below  £210,000 is, by definition, 'living in poverty'.

At the other extreme if you could ensure that every household in the UK earns a similar amount, such as in Cuba where almost everybody earns $20 per month then the number of children 'living in poverty' is officially zero (since the median is $240 per year and nobody earns less than $144).

In fact, in any wealthy free-market economy whichever way you look at the definition it is loaded not only to exaggerate the number of people living in poverty but also to ensure (unless there is massive wealth redistribution to ensure every household income is close to the median level) there will always be a 'poverty' problem:
  • Households with children are much more likely to have one, rather than two, wage earners, so by definition households with children will dominate those below the median income level.
  • Over the last 20 years people have been having fewer children and having them later in life, which again means that an increasing proportion of the country's children inevitably live in households whose income is below the median (hence the 'significant increase in the proportion of children living in poverty over the last 25 years').
  • Families with large numbers of children (> 3) increasingly are in the immigrant community (Asia/Africa) whose households are disproportionately below the median income. 
Unless the plan is stop households on below median income from having children (also known as eugenics), the only way to achieve the stated objective of 'making child poverty history' (according to this definition) is to redistribute wealth so that no household income is less than 60% of the median (also known as communism). Judging by some of the people who have been pushing the 'poverty' definition and agenda it would seem the latter is indeed their real objective.


Friday, 14 March 2014

Bayesian network approach to Drug Economics Decision Making


Consider the following problem:
A relatively cheap drug (drug A) has been used for many years to treat patients with disease X. The drug is considered quite successful since data reveals that 85% of patients using it have a ‘good outcome’ which means they survive for at least 2 years. The drug is also quite cheap, costing on average $100 for a prolonged course. The overall “financial benefit” of the drug (which assumes a ‘good outcome’ is worth $5000 and is defined as this figure minus the cost) has a mean of $4985.

There is an alternative drug (drug B) that a number of specialists in disease X strongly recommend. However, the data reveals that only 65% of patients using drug B survive for at least 2 years (Fig. 1(b)). Moreover, the average cost of a prolonged course is $500. The overall “financial benefit” of the drug has a mean of just $2777.
On seeing the data the Health Authority recommends a ban against the use of drug B. Is this a rational decision?

The answer turns out to be no. The short paper here explains this using a simple Bayesian network model that you can run (by downloading the free copy of AgenaRisk)

Friday, 17 January 2014

More on birthday coincidences

My daughter's birthday was last week (12 January), so I had a personal interest in today's  Telegraph article about a family with 4 children all having the same birthday - 12 January

Family with 4 children - all born on 12 January
Anybody who has read our book or seen our Probability Puzzles page will be familiar with the problem of 'coincidences' being routinely exaggerated (by which I mean probabilities of apparently very unlikely events are not as low as people assume). There is the classic birthdays problem that fits into this category (in a class of 23 children the probability that at least two will share the same birthday is actually better than 50%); but of more concern is that national newspapers routinely print ludicrously exaggerated figures for 'incredible events'*. 

So when I saw the story in today's Telegraph I did what I always do in such cases - work out how wrong the stated odds are. Fortunately, in this case the Telegraph gets it spot on: for a family with 4 children, two of whom are twins, the probability that all 4 have the same birthday is approximately 1 in 133,225. Why? because it is simply the probability that the twins (who we can assume must be born on the same day) have the same birthday as the first child times the probability that the youngest child has the same birthday as the first child. That is 1/365 times 1/365 which is 1/133225. It is the same, of course, as the chance of a family of three children (none of whom are twins or triplets) each having the same birthday. The Telegraph also did not make the common mistake of stating/suggesting that the 1 in 133,225 figure was the probability of this happening in the whole of the UK. In fact, since there are about 800,000 families in the UK with 4 children and since about one in every 100 births are twins, we can assume there are about 8,000 families in the UK with 4 children including a pair of twins. The chances of at least one such family having all children with the same birthday are about 1 in 17.



*Our book gives many examples and also explains why the newspapers routinely make the same types of errors in their calculations. For example (Chapter 4) the Sun published a story in which a mother had just given birth to her 8th child -  all of whom were boys; it claimed the chance of this happening were 'less then 1 in a billion'.  In fact, in any family of 8 children there is a 1 in 256 probability that all 8 will be boys. So, assuming that approximately 1000 women in the UK every year give birth to their 8th child it follows that there is about a 98% chance that in any given year in the UK a mother would give birth to an 8th child all of whom were boys.

Wednesday, 15 January 2014

Sally Clark revisited: another key statistical oversight

The Sally Clark case was notorious for the prosecution’s misuse of statistics in respect of Sudden Infant Death Syndrome (SIDS). In particular, the claim made by Roy Meadows at the original trial – that there was “only a 1 in 73 million chance of both children being SIDS victims” – has been thoroughly, and rightly, discredited.

However, as made clear by probability experts who analysed the case, the key statistical error made was to consider the (prior) probability of SIDS without comparing it to the (prior) probability of murder of a child by a parent. The experts correctly focused on the critical need for this comparison. However, there is an oversight in the way the experts built their arguments. Specifically, the prior probability of the ‘double SIDS’ hypothesis (which we can think of as the ‘defence’ hypothesis) has been compared with the prior probability of the ‘double murder’ hypothesis (which we can think of as the ‘prosecution’ hypothesis’). But, since it would have been sufficient for the prosecution to establish just one murder, the correct hypothesis to compare to ‘double SIDS’ is not ‘double murder’ but rather ‘at least one murder’. The difference can be very important. For example, based on the same assumptions used by one of the probability experts who examined the case, the prior odds in favour of the defence hypothesis over the prosecution are not 30 to 1 but rather more like 5 to 2. After medical and other evidence is taken into account this difference can be critical. The case demonstrates that, in order to use probabilities in legal arguments effectively, it is crucial to identify appropriate hypotheses.

I have submitted a paper about this. The draft is here.

Saturday, 7 September 2013

Barry George case: new insights on the evidence

Jill Dando
Barry George

Our new paper*  "When ‘neutral’ evidence still has probative value: implications from the Barry George Case" (published in the journal Science and Justice) casts doubts on the reasoning in the 2007 Appeal Court judgement that led to the quashing of Barry George's conviction for the shooting to death of TV celebrity Jill Dando.

The paper examines the transcript of the Appeal in the context of new probabilistic research about the probative value of evidence. George's successful appeal was based primarily on the argument that the prosecution's evidence about a particle of firearm discharge residue (FDR) discovered in George's coat pocket, was presented in a way that may have misled the jury. Specifically, the jury in the original trial had heard that the FDR evidence was very unlikely to have been found if Barry George had not fired the gun that killed Jill Dando. Most people would interpret such an assertion as strong evidence in favour of the prosecution case. However, afterwards the same forensic expert concluded that the FDR evidence was just as unlikely to have been discovered if Barry George had  fired the gun. In such a scenario the evidence is considered to be ‘neutral’ - favouring neither the prosecution nor the defence. Hence, the appeal court considered the verdict unsafe and the conviction was quashed. Following the appeal ruling, the FDR was excluded from the jury at George's retrial and he was acquitted.  However, our paper shows that the FDR evidence may not have been neutral after all. 

Formally, the probative value of evidence is captured by a simple probability formula called the likelihood ratio (LR). The LR is the probability of finding the evidence if the prosecution hypothesis is true divided by the probability of finding the evidence if the defence hypothesis is true. Intuitively, if the LR is greater than one then the evidence supports the prosecution hypothesis; if the LR is less than one it supports the defence hypothesis, and if the LR is equals to one (as in the case of the FDR evidence here) then the evidence favours neither and so is 'neutral'.  Accordingly the LR is a commonly recommended method for forensic scientists to use in order to explain the probative value of evidence. However, the new research in the paper shows that the prosecution and defence hypotheses have to be formulated in a certain way in order for the LR to 'work' as expected. Otherwise it is possible, for example, to have evidence whose LR is equal to one but which still has significant probative value.  Our review of the appeal transcript shows that relevant prosecution and defence hypotheses were not properly formulated and, if one were to follow the arguments recorded in the Appeal judgement verbatim, then contrary to the Appeal conclusion, the probative value of the FDR evidence may not have been neutral as was concluded, but rather still supported the prosecution**.

*Full details: Fenton, N. E., D. Berger, D. Lagnado, M. Neil and A. Hsu, (2013). "When ‘neutral’ evidence still has probative value (with implications from the Barry George Case)", Science and Justice, http://dx.doi.org/10.1016/j.scijus.2013.07.002 published online 19 August 2013. For those who do not have full access to the journal, a pre-publication draft of the article can be found here.

** Although the FDR evidence may have been probative after all, we are not in a position to comment on the overall case against Bary George, which others have argued was not particularly strong. Also, it could be argued that even though the FDR evidence was not 'neutral' as assumed in the Appeal, its probative value may not have been as strongly favourable to the prosecution as implied in the original trial; this may have been sufficient in itself to cast doubt on the safety of the conviction.