Friday, 14 March 2014

Bayesian network approach to Drug Economics Decision Making


Consider the following problem:
A relatively cheap drug (drug A) has been used for many years to treat patients with disease X. The drug is considered quite successful since data reveals that 85% of patients using it have a ‘good outcome’ which means they survive for at least 2 years. The drug is also quite cheap, costing on average $100 for a prolonged course. The overall “financial benefit” of the drug (which assumes a ‘good outcome’ is worth $5000 and is defined as this figure minus the cost) has a mean of $4985.

There is an alternative drug (drug B) that a number of specialists in disease X strongly recommend. However, the data reveals that only 65% of patients using drug B survive for at least 2 years (Fig. 1(b)). Moreover, the average cost of a prolonged course is $500. The overall “financial benefit” of the drug has a mean of just $2777.
On seeing the data the Health Authority recommends a ban against the use of drug B. Is this a rational decision?

The answer turns out to be no. The short paper here explains this using a simple Bayesian network model that you can run (by downloading the free copy of AgenaRisk)

Friday, 17 January 2014

More on birthday coincidences

My daughter's birthday was last week (12 January), so I had a personal interest in today's  Telegraph article about a family with 4 children all having the same birthday - 12 January

Family with 4 children - all born on 12 January
Anybody who has read our book or seen our Probability Puzzles page will be familiar with the problem of 'coincidences' being routinely exaggerated (by which I mean probabilities of apparently very unlikely events are not as low as people assume). There is the classic birthdays problem that fits into this category (in a class of 23 children the probability that at least two will share the same birthday is actually better than 50%); but of more concern is that national newspapers routinely print ludicrously exaggerated figures for 'incredible events'*. 

So when I saw the story in today's Telegraph I did what I always do in such cases - work out how wrong the stated odds are. Fortunately, in this case the Telegraph gets it spot on: for a family with 4 children, two of whom are twins, the probability that all 4 have the same birthday is approximately 1 in 133,225. Why? because it is simply the probability that the twins (who we can assume must be born on the same day) have the same birthday as the first child times the probability that the youngest child has the same birthday as the first child. That is 1/365 times 1/365 which is 1/133225. It is the same, of course, as the chance of a family of three children (none of whom are twins or triplets) each having the same birthday. The Telegraph also did not make the common mistake of stating/suggesting that the 1 in 133,225 figure was the probability of this happening in the whole of the UK. In fact, since there are about 800,000 families in the UK with 4 children and since about one in every 100 births are twins, we can assume there are about 8,000 families in the UK with 4 children including a pair of twins. The chances of at least one such family having all children with the same birthday are about 1 in 17.



*Our book gives many examples and also explains why the newspapers routinely make the same types of errors in their calculations. For example (Chapter 4) the Sun published a story in which a mother had just given birth to her 8th child -  all of whom were boys; it claimed the chance of this happening were 'less then 1 in a billion'.  In fact, in any family of 8 children there is a 1 in 256 probability that all 8 will be boys. So, assuming that approximately 1000 women in the UK every year give birth to their 8th child it follows that there is about a 98% chance that in any given year in the UK a mother would give birth to an 8th child all of whom were boys.

Wednesday, 15 January 2014

Sally Clark revisited: another key statistical oversight

The Sally Clark case was notorious for the prosecution’s misuse of statistics in respect of Sudden Infant Death Syndrome (SIDS). In particular, the claim made by Roy Meadows at the original trial – that there was “only a 1 in 73 million chance of both children being SIDS victims” – has been thoroughly, and rightly, discredited.

However, as made clear by probability experts who analysed the case, the key statistical error made was to consider the (prior) probability of SIDS without comparing it to the (prior) probability of murder of a child by a parent. The experts correctly focused on the critical need for this comparison. However, there is an oversight in the way the experts built their arguments. Specifically, the prior probability of the ‘double SIDS’ hypothesis (which we can think of as the ‘defence’ hypothesis) has been compared with the prior probability of the ‘double murder’ hypothesis (which we can think of as the ‘prosecution’ hypothesis’). But, since it would have been sufficient for the prosecution to establish just one murder, the correct hypothesis to compare to ‘double SIDS’ is not ‘double murder’ but rather ‘at least one murder’. The difference can be very important. For example, based on the same assumptions used by one of the probability experts who examined the case, the prior odds in favour of the defence hypothesis over the prosecution are not 30 to 1 but rather more like 5 to 2. After medical and other evidence is taken into account this difference can be critical. The case demonstrates that, in order to use probabilities in legal arguments effectively, it is crucial to identify appropriate hypotheses.

I have submitted a paper about this. The draft is here.

Saturday, 7 September 2013

Barry George case: new insights on the evidence

Jill Dando
Barry George

Our new paper*  "When ‘neutral’ evidence still has probative value: implications from the Barry George Case" (published in the journal Science and Justice) casts doubts on the reasoning in the 2007 Appeal Court judgement that led to the quashing of Barry George's conviction for the shooting to death of TV celebrity Jill Dando.

The paper examines the transcript of the Appeal in the context of new probabilistic research about the probative value of evidence. George's successful appeal was based primarily on the argument that the prosecution's evidence about a particle of firearm discharge residue (FDR) discovered in George's coat pocket, was presented in a way that may have misled the jury. Specifically, the jury in the original trial had heard that the FDR evidence was very unlikely to have been found if Barry George had not fired the gun that killed Jill Dando. Most people would interpret such an assertion as strong evidence in favour of the prosecution case. However, afterwards the same forensic expert concluded that the FDR evidence was just as unlikely to have been discovered if Barry George had  fired the gun. In such a scenario the evidence is considered to be ‘neutral’ - favouring neither the prosecution nor the defence. Hence, the appeal court considered the verdict unsafe and the conviction was quashed. Following the appeal ruling, the FDR was excluded from the jury at George's retrial and he was acquitted.  However, our paper shows that the FDR evidence may not have been neutral after all. 

Formally, the probative value of evidence is captured by a simple probability formula called the likelihood ratio (LR). The LR is the probability of finding the evidence if the prosecution hypothesis is true divided by the probability of finding the evidence if the defence hypothesis is true. Intuitively, if the LR is greater than one then the evidence supports the prosecution hypothesis; if the LR is less than one it supports the defence hypothesis, and if the LR is equals to one (as in the case of the FDR evidence here) then the evidence favours neither and so is 'neutral'.  Accordingly the LR is a commonly recommended method for forensic scientists to use in order to explain the probative value of evidence. However, the new research in the paper shows that the prosecution and defence hypotheses have to be formulated in a certain way in order for the LR to 'work' as expected. Otherwise it is possible, for example, to have evidence whose LR is equal to one but which still has significant probative value.  Our review of the appeal transcript shows that relevant prosecution and defence hypotheses were not properly formulated and, if one were to follow the arguments recorded in the Appeal judgement verbatim, then contrary to the Appeal conclusion, the probative value of the FDR evidence may not have been neutral as was concluded, but rather still supported the prosecution**.

*Full details: Fenton, N. E., D. Berger, D. Lagnado, M. Neil and A. Hsu, (2013). "When ‘neutral’ evidence still has probative value (with implications from the Barry George Case)", Science and Justice, http://dx.doi.org/10.1016/j.scijus.2013.07.002 published online 19 August 2013. For those who do not have full access to the journal, a pre-publication draft of the article can be found here.

** Although the FDR evidence may have been probative after all, we are not in a position to comment on the overall case against Bary George, which others have argued was not particularly strong. Also, it could be argued that even though the FDR evidence was not 'neutral' as assumed in the Appeal, its probative value may not have been as strongly favourable to the prosecution as implied in the original trial; this may have been sufficient in itself to cast doubt on the safety of the conviction.

Wednesday, 7 August 2013

The problem with predicting football results - you cannot rely on the data


Bloomberg Sports have published their predictions for the forthcoming Premiership season in the form of the predicted end of season table. Here are some key snippets from their press release:
The table indicates that this season will be a three horse race between Chelsea, Manchester City and Manchester United .... The Bloomberg Sports forecast expects Arsenal to claim the final Champions League place ahead of North London rivals Tottenham Hotspur.... At the bottom of the table, all three newly promoted teams are expected to face the drop...
There is just one problem with this set of 'predictions'. The final table - with very minor adjustments - essentially replicates last season's final positions.  The top seven remain the same (with the only positional changes being Chelsea and Man Utd switch positions 1 and 3, and Liverpool and Everton switch positions 6 and 7). And the bottom three are the three promoted teams so they also 'retain' their positions.

Bloomberg say they are using "mathematically-derived predictions" using "vast amounts of objective data". But herein lies the problem. As we argue in our book, relying on data alone is the classical statistical  approach to this kind of prediction. And classical statistics is great at 'predicting the past'. The problem is that we actually want to predict the future not the past!

Along with my PhD student Anthony Constantinou we have been applying Bayesian networks and related methods to the problem of football prediction for a number of years. The great thing about Bayesian networks is that they enable you to combine the standard statistical data (most obviously historical and recent match results) with subjective factors. And it is the incorporation of the subjective (expert) factors that is the key to improved prediction that 'classical' statisticians just do not seem to get.
 
This combination of data and expert judgement has enabled us to get more accurate predictions then any other published system and has even enabled us to 'beat the bookies' consistently (based on a simple betting strategy) despite the bookies' built-in profit margin. Unlike Bloomberg (and others) we have made our methods, models and results very public (a list of published papers in scholarly journals is below). In fact for the last two years Anthony has posted the predictions for all matches the day before they take place on his website pi-football. The prediction for each match is summarised as a very simple set of probabilities, namely the probability of a home win, draw and away win. Good betting opportunities occur when one of the probabilities is significantly higher than the the equivalent probability from the bookies odds.
Example: Suppose Liverpool are playing at home to Stoke. Because of the historical data the bookies would regard Liverpool as strong favourites. They would typically rate the chances of Stoke winning to be very low - say 10% (which in 'odds terms equates to '9 to 1 against'). They add their 'mark-up' and publish odds of, say, 8 to 1 against a Stoke win (which in probability terms is 1/9 or 11%). But suppose there are specific factors that lead our model to predict that the probability of a Stoke win is 20%. Then the model is saying that the bookmakers odds - even given their mark-up - have significantly underestimated the probability of a Stoke win. Although our model still only gives Stoke a 20% chance of winning it is worth placing a bet. Imagine 10 match scenarios like this. If our predictions are correct then you will win on 2 of the 10 occasions. Assuming you bet £1 each time you will end up spending £10 and getting £18 back - a very healthy 80% profit margin.
Thanks to Alex on the Spurs-list for the tip-off on the Bloomberg report.

References:
  • Constantinou, A., N. E. Fenton and M. Neil (2013) "Profiting from an Inefficient Association Football Gambling Market: Prediction, Risk and Uncertainty Using Bayesian Networks". Knowledge-Based Systems. http://dx.doi.org/10.1016/j.knosys.2013.05.008
  • Constantinou, A. C. and N. E. Fenton (2013). "Determining the level of ability of football teams by dynamic ratings based on the relative discrepancies in scores between adversaries." Journal of Quantitative Analysis in Sports 9(1): 37-50. http://dx.doi.org/10.1515/jqas-2012-0036
  • Constantinou, A., N. E. Fenton and M. Neil (2012). ""pi-football: A Bayesian network model for forecasting Association Football match outcomes." Knowledge Based Systems, 36, 322-339,  http://dx.doi.org/10.1016/j.knosys.2012.07.008
  • Constantinou, A. , Fenton, N.E., "Solving the problem of inadequate scoring rules for assessing probabilistic football forecasting models", Journal of Quantitative Analysis in Sports, Vol. 8 (1), Article 1, 2012. http://dx.doi.org/10.1515/1559-0410.1418

Friday, 5 July 2013

Flaky DNA (the prosecutor's fallacy yet again and much more to be worried about)

In August 2012 David Butler (who had been jailed for the 2005 murder of Anne Marie Foy) was freed when it was discovered that the DNA evidence - which had essentially been the only evidence against him - was flaky in more senses than one. Tiny traces of DNA, whose profile matched that of Butler, were discovered under Foy's fingernails. A sample of Butler's DNA had been previously stored in a database (the police had mistakenly assumed it belonged to the person who burgled his mother's house). It was a search of this database that revealed his DNA matched that under Foy's fingernails.

The reports here and here give a good overview of the case, focusing on the critical observation that some people - such as Butler - have especially dry skin making it extremely likely to shed tiny amounts of DNA wherever they go. This means that Butler - a cab driver - could have easily transferred his cells simply by handling money that was later passed on to either the victim or the real attacker. A more recent US case - described here - also provides an example of how easily DNA can be innocently 'transferred' to a crime scene and mistakenly assumed to belong to the person who committed the crime.

The reporting of these cases highlights just one important scenario under which the probative value of DNA evidence can be massively exaggerated, namely the fact that there are multiple opportunities for DNA to be 'transferred'. This means that DNA found at a crime scene or on a victim could have come from multiple innocent sources.

But there are many other, less well understood, scenarios under which the probative value of DNA evidence can be massively exaggerated, and the Bulter case actually highlights all of few of them:

  1.  Incorrectly reporting the probabilistic impact: In reporting the impact of the DNA evidence it appears (based on the Telegraph report) that the prosecuting QC has yet again committed the prosecutor's fallacy. The statement that there is “a one billion-to-one chance that the DNA belongs to anyone else" is wrong (just as it was  here here and here). In fact, if the DNA profile was indeed such that it is found in one in a billion people, then it is likely to be shared with about six other (unknown and unrelated) people in the world. In the absence of any other evidence against the defendant there is actually therefore a 6/7 chance that it belongs to 'anyone' else.
  2. The impact of a database search: Finding the matching DNA as a result of a database search, rather than as a result of testing a suspect on the basis of some other evidence, completely changes the impact of the evidence. This is especially devastating when there is so-called 'low-template DNA' - where the random match probabilities are nothing like as low as 1 in a billion.  Let's suppose the DNA at the crime scene is such that it is found in one in every 10,000 people. Then even in a fairly small database - say of 5,000 individuals' DNA samples - there is a good chance (about 50%) that we will find a match to the crime scene DNA. Suppose we find a 'match'. Have we 'got our man'. Almost certainly not. In the UK alone we would expect 6000 people to have the matching DNA. In some cases low-template DNA profiles have a match probability of 1 in 100. In such situations a database match tells us nothing at all. If we charged the first matching person we found we would almost certainly have the wrong person.
  3. The potential for errors in DNA analysis and testing. It is not just the potential for 'innocent transfer' that we have to consider when we think about 'human error'.  Brian McKeown, chief scientist representative from LGC Forensics says:
         "..the science is flawless and must not be ignored. If you do it right you get the right result.”.
     Yet LGC have themselves committed high-profile critical DNA testing errors, such as those reported here and here. When their scientists report the probabilistic impact of DNA matches they never incorporate the very real probability of errors that can be introduced at numerous stages in the process. As we explained here (and we will be reporting much more extensively on this in upcoming papers) when sensible allowance is made for human error, the DNA 'statistics' become very different.
  4. The critical importance of absence of DNA evidence. If a person - especially one who easily sheds DNA - really did rape and strangle the victim then, the fact that only tiny cells of DNA matching theirs are discovered on the victim is actually two pieces of evidence. One is made explicit - that the DNA matches - and it supports the prosecution case. But the other - that no substantive DNA from the defendant was found - is typically ignored; and it may provide very strong support for the defence case. This 'evidence' of  'relative absence of DNA evidence' has been a key (previously ignored) factor in cases I have been recently involved in, so hopefully soon I will be able to reveal more about its impact.
  5. The entire theoretical basis for DNA 'match probabilities' and sampling is itself extremely flaky. This is something I am currently looking at with colleagues and will be writing about soon.
Unlike some others, I am not suggesting the imminent future demise of DNA in the courtroom. However, I am convinced that a far more critical approach to both the presentation and evaluation of DNA evidence is urgently required to avoid future miscarriages of justice. And I am convinced that many - as yet undiscovered - errors in DNA analysis means that innocent people are in jail and guilty people are at large.


Thursday, 11 April 2013

Bayesian networks plagiarism

If, as they say, imitation is the sincerest form of flattery then we are privileged to have discovered (thanks to a tip off by Philip Leicester) that our work on Bayesian network idioms - first published in Neil M, Fenton NE, Nielsen L, ''Building large-scale Bayesian Networks'', The Knowledge Engineering Review, 15(3), 257-284, 2000 (and covered extensively in Chapter 7 of our book) has been re-published - almost verbatim -  in the following publication:
Milan Tuba and Dusan Bulatovic, "Design of an Intruder Detection System Based on Bayesian Networks", WSEAS Transactions on Computers, 5(9), pp 799-809, May 2009. ISSN: 1109-2750
The whole of Section 3 ("Some design aspects of large Bayesian networks") - which constitutes 6 out of the 10 pages - is lifted from our 2000 paper.  Our work was partly inspired by the work of Laskey and Mahoney. The authors reference that work but, of course, not ours, hence confirming the very deliberate plagiarism.

Milan Tuba and Dusan Bulatovic are at the Megatrend University of Belgrade (which we understand is a small private University) and we had not come across them before now. The journal WSEAS Transactions on Computers seems to be an example of one of the dubious journals exposed in this week's New York Times article. Curiously enough, after a colleague distributed that article yesterday I was going to write back to him saying that I disagreed with the rather elitist tone of the article, which suggests that the peer review process of the 'reputable scientific journals' was somehow unimpeachable - in reality there is no consensus on what journals are 'reputable' and even the refereeing of those widely considered to be the best is increasingly erratic and at times bordering on corrupt (which is inevitable when it relies exclusively on volunteer academics).  But at least I would hope that any 'reputable' journal would still be alert to the kind of plagiarism we now see here.

This is not the first time our work has been very blatantly plagiarised. Interestingly, on a previous occasion it was in a book that was published by Wiley Finance (who I am sure are widely considered one of the most reputable publishers). The book was 'written' by a guy who had been our PhD student for a short time at City University before he vanished without notice or explanation. The book contained large chunks of our work (none of which the 'author' had contributed to, as it predated his time as a PhD student with us) without any attribution. Despite informing Wiley of this, and proving to them that a) the author's qualifications as stated in the book were bogus; and b) the endorsements on the back cover were fraudulent, they did nothing about it.