tag:blogger.com,1999:blog-64680947485770587162017-10-24T04:05:09.942-07:00Probability and RiskImproving public understanding of probability and risk with special emphasis on its application to the law. Why Bayes theorem and Bayesian networks are neededNorman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.comBlogger82125tag:blogger.com,1999:blog-6468094748577058716.post-14222191154487027562017-09-11T10:52:00.000-07:002017-09-11T10:52:00.666-07:00An objective prior probability for guilt?<div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-DvXynZaOfNk/WbahAS2l16I/AAAAAAAAAjY/vFStRVs9VZ4onVJ-h9KKPRxplE7Hggw3gCEwYBhgL/s1600/prior_guilty.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="656" data-original-width="1140" height="280" src="https://2.bp.blogspot.com/-DvXynZaOfNk/WbahAS2l16I/AAAAAAAAAjY/vFStRVs9VZ4onVJ-h9KKPRxplE7Hggw3gCEwYBhgL/s400/prior_guilty.jpg" width="480" /></a></div><br /><br />One of the greatest impediments to the use of probabilistic reasoning in legal arguments is the difficulty in agreeing on an appropriate prior probability that the defendant is guilty. The 'innocent until proven guilty' assumption technically means a prior probability of 0 - a figure that (by Bayesian reasoning) can never be overturned no matter how much evidence follows. Some have suggested the logical equivalent of 1/<i>N</i> where <i>N</i> is the number of people in the world. But this probability is clearly too low as <i>N</i> includes too many who could not physically have committed the crime. On the other hand the often suggested prior 0.5 is too high as it stacks the odds too much against the defendant.<br /><br />Therefore, even strong supporters of a Bayesian approach seem to think they can and must ignore the need to consider a prior probability of guilt (indeed it is this thinking that explains <a href="http://bayesknowledge.blogspot.co.uk/2016/11/confusion-over-likelihood-ratio.html">the prominence of the 'likelihood ratio' approach discussed so often on this blog</a>). <br /><br /><a href="https://www.eecs.qmul.ac.uk/%7Enorman/papers/prior_probability.pdf">New work</a> - presented at the 2017 International Conference on Artificial Intelligence and the Law (ICAIL 2017) - shows that, in a large class of cases, it <i><b>is</b></i> possible to arrive at a realistic prior that is also as consistent as possible with the legal notion of ‘innocent until proven guilty’. The approach is based first on identifying the 'smallest' time and location from the actual crime scene within which the defendant was definitely present and then estimating the number of people - other than the suspect - who were also within this time/area. If there were <i>n</i> people in total, then before any other evidence is considered each person, including the suspect, has an equal prior probability 1/<i>n</i> of having carried out the crime.<br /><br />The method applies to cases where we assume a crime has definitely taken place and that it was committed by one person against one other person (e.g. murder, assault, robbery). The work considers both the practical and legal implications of the approach and demonstrates how the prior probability is naturally incorporated into a generic Bayesian network model that allows us to integrate other evidence about the case.<br /><br />Full details:<br /><blockquote class="tr_bq">Fenton, N. E., Lagnado, D. A., Dahlman, C., & Neil, M. (2017). "The Opportunity Prior: A Simple and Practical Solution to the Prior Probability Problem for Legal Cases". In <i>International Conference on Artificial Intelligence and the Law</i> (ICAIL 2017). Published by ACM. <a href="https://www.eecs.qmul.ac.uk/%7Enorman/papers/prior_probability.pdf">Pre-publication draft</a>. </blockquote>See also <br /><ul><li><a href="http://bayesknowledge.blogspot.co.uk/2016/11/confusion-over-likelihood-ratio.html">Confusion over the Likelihood ratio</a></li><li><a href="http://bayesknowledge.blogspot.co.uk/2017/08/the-likelihood-ratio-and-its-use-in.html">The likelihood ratio and its use in the 'grooming gangs' news story</a></li><li><a href="http://bayesknowledge.blogspot.co.uk/2016/02/problems-with-likelihood-ratio-method.html">Problems with the Likelihood Ratio method for determining probative value of evidence: the need for exhaustive hypotheses</a></li><li><a href="http://bayesknowledge.blogspot.co.uk/2017/01/the-problem-with-likelihood-ratio-for.html">Problem with likelihood ratio for DNA mixture profiles</a> </li><li><a href="http://bayesknowledge.blogspot.co.uk/2016/01/misleading-dna-evidence-and-current.html">Misleading DNA evidence</a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2013/09/barry-george-case-new-insights-on.html">Barry George case: new insights on the evidence </a></li></ul>Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com9tag:blogger.com,1999:blog-6468094748577058716.post-82999887049473305212017-09-07T09:31:00.003-07:002017-09-07T09:31:26.267-07:00Recommendations for Dealing with Quantitative Evidence in Criminal Law<div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-KBsor2OtrWk/WbFjinreOGI/AAAAAAAAAi4/Uf2tTs5Jou0eccQOudb0kUjUcSQSzSaHQCLcBGAs/s1600/newton_guidelines.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="679" data-original-width="637" height="400" src="https://2.bp.blogspot.com/-KBsor2OtrWk/WbFjinreOGI/AAAAAAAAAi4/Uf2tTs5Jou0eccQOudb0kUjUcSQSzSaHQCLcBGAs/s400/newton_guidelines.jpg" width="375" /></a></div><br />From July to December 2016 the <a href="https://www.newton.ac.uk/event/fos">Isaac Newton Institute Programme on Probability and Statistics in Forensic Science</a> in Cambridge hosted many of the world's leading figures from the law, statistics and forensics with a mixture of academics (including mathematicians and legal scholar), forensic practitioners, and practicing lawyers (including judges and eminent QCs). Videos of many of the seminars and presentation from the Programme can be seen <a href="https://www.newton.ac.uk/event/fos/seminars">here</a>.<br /><br /><br />A <a href="http://www.newton.ac.uk/files/preprints/ni16061.pdf">key output of the Programme has now been published</a>. It is a very simple set of twelve guiding principles and recommendations for dealing with quantitative evidence in criminal law for the use of statisticians, forensic scientists and legal professionals. The layout consists of one principle per page as shown below.<br /><br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-lDOZX0oaFPA/WbFngW1VwSI/AAAAAAAAAjE/yBQTBwgwhJI-cdnbZYgYMQq3nKkXCYrigCLcBGAs/s1600/principle1.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="809" data-original-width="938" height="343" src="https://1.bp.blogspot.com/-lDOZX0oaFPA/WbFngW1VwSI/AAAAAAAAAjE/yBQTBwgwhJI-cdnbZYgYMQq3nKkXCYrigCLcBGAs/s400/principle1.jpg" width="400" /></a></div><br />Links:<br /><br /><ul><li><a href="http://www.newton.ac.uk/files/preprints/ni16061.pdf">Twelve Guiding Principles and Recommendations for Dealing with Quantitative Evidence in Criminal Law</a></li><li><a href="https://www.newton.ac.uk/event/fos">Isaac Newton Institute (INI) Programme on Probability and Statistics in Forensic Science in Cambridge</a> </li><li><a href="https://www.newton.ac.uk/event/fos/seminars">Videos of seminars from the Programme </a></li><li><a href="https://www.newton.ac.uk/event/fosw01/timetable">Watch the presentations from the workshop "The nature of questions arising in court that can be addressed via probability and statistical methods" from 30 August to 2 September.</a></li><li><a href="https://www.newton.ac.uk/event/fosw02">"Bayesian Networks and Argumentation in Evidence Analysis" 26-29 September</a></li><li> <a href="http://bayesknowledge.blogspot.co.uk/2016/09/bayes-and-law-whats-been-happening-in.html">Bayes and the Law: what's happening in Cambridge</a></li><li><a href="http://bayes-knowledge.com/">BAYES-KNOWLEDGE project</a></li></ul>Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com7tag:blogger.com,1999:blog-6468094748577058716.post-83154146384834422702017-08-14T13:11:00.001-07:002017-08-15T15:30:01.034-07:00The likelihood ratio and its use in the 'grooming gangs' news story<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-fp8mDNOPGg4/WZICaUS4NPI/AAAAAAAAAio/hOWRKDhJibsQ0uNjwnIrf4H-ggpsx6-JACLcBGAs/s1600/easy_meat.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="418" data-original-width="296" height="320" src="https://3.bp.blogspot.com/-fp8mDNOPGg4/WZICaUS4NPI/AAAAAAAAAio/hOWRKDhJibsQ0uNjwnIrf4H-ggpsx6-JACLcBGAs/s320/easy_meat.jpg" width="226" /></a></div>This blog has reported many times previously (see links below) about problems with using the <i><b>likelihood ratio</b></i>. Recall that the likelihood ratio is commonly used as a measure of the probative value of some evidence <i>E</i> for a hypothesis <i>H</i>; it is defined as the probability of <i>E</i> given <i>H</i> divided by the probability of <i>E</i> given <i>not H</i>. <br /><br />There is especially great confusion in its use where we have data for the probability of <i>H</i> given <i>E</i> rather than for the probability of <i>E</i> given <i>H</i>. Look at the somewhat confusing argument here in relation to the offence of 'child grooming' which is taken directly from the book <i><b>McLoughlin, P. “Easy Meat: Inside Britain’s Grooming Gang Scandal.” (2016):</b></i><br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-3OalD5iVG-g/WZIAxsAasGI/AAAAAAAAAic/HIJTNZ0iaVUH9CozUMlKemquT7omsNIrgCLcBGAs/s1600/grooming.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="505" data-original-width="722" height="446" src="https://4.bp.blogspot.com/-3OalD5iVG-g/WZIAxsAasGI/AAAAAAAAAic/HIJTNZ0iaVUH9CozUMlKemquT7omsNIrgCLcBGAs/s640/grooming.jpg" width="640" /></a></div><br /><br />Given the sensitive nature of the grooming gangs story in the UK and the increasing number of convictions, it is important to get the maths right. The McLoughlin book is the most thoroughly researched work on the subject. What the author of the book is attempting to determine is the likelihood ratio of the evidence <i>E</i> with respect to the hypothesis <i>H</i> where: <br /><br /><blockquote class="tr_bq"><i>H</i>: “Offence is committed by a Muslim” (so <i>not H</i> means “Offence is committed by a non-Muslim”) <br /><br /><i>E</i>: “Offence is child grooming” </blockquote><br />In this case, the population data cited by McLoughlin provides our priors <i>P</i>(<i>H</i>)=0.05 and, hence, <i>P</i>(<i>not H</i>)=0.95. But we also have the data on child grooming convictions that gives us <i>P</i>(<i>H </i>|<i> E</i>)=0.9 and, hence, <i>P</i>(<i>not H </i>| <i>E</i>)=0.1. <br /><br />What we do NOT have here is direct data on either <i>P</i>(<i>E</i>|<i>H</i>) or <i>P</i>(<i>E</i>|<i>not H</i>). However, we can still use Bayes theorem to calculate the likelihood ratio since:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-mhE6XYRXuG0/WZH_pH3WkfI/AAAAAAAAAiQ/ghzqIszy0R4ULVXeQ7dUTmhP_fqCZVENwCLcBGAs/s1600/lr.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="121" data-original-width="940" height="82" src="https://4.bp.blogspot.com/-mhE6XYRXuG0/WZH_pH3WkfI/AAAAAAAAAiQ/ghzqIszy0R4ULVXeQ7dUTmhP_fqCZVENwCLcBGAs/s640/lr.jpg" width="640" /></a></div>So, in the example we get: <br /><div class="MsoNormal"><span style="font-family: "arial" , "sans-serif"; font-size: 10.0pt; line-height: 115%;"><br /></span></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-mQofCme0ha8/WZH_4QkYSlI/AAAAAAAAAiU/JKwCV12yNKocF6tCd2bY_k3RSUmyZHKWACLcBGAs/s1600/lr2.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="101" data-original-width="410" height="48" src="https://3.bp.blogspot.com/-mQofCme0ha8/WZH_4QkYSlI/AAAAAAAAAiU/JKwCV12yNKocF6tCd2bY_k3RSUmyZHKWACLcBGAs/s200/lr2.jpg" width="200" /></a></div><br />Hence, while the method described in the book is flawed, the conclusion arrived at is (almost) correct.<br /><br />See also <br /><ul><li><a href="http://bayesknowledge.blogspot.co.uk/2017/01/the-problem-with-likelihood-ratio-for.html">Problem with likelihood ratio for DNA mixture profiles</a></li><li><a href="http://bayesknowledge.blogspot.co.uk/2016/11/confusion-over-likelihood-ratio.html">Confusion over the Likelihood ratio</a></li><li><a href="http://bayesknowledge.blogspot.co.uk/2016/02/problems-with-likelihood-ratio-method.html">Problems with the Likelihood Ratio method for determining probative value of evidence: the need for exhaustive hypotheses</a></li><li><a href="http://bayesknowledge.blogspot.co.uk/2016/01/misleading-dna-evidence-and-current.html">Misleading DNA evidence</a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2013/09/barry-george-case-new-insights-on.html">Barry George case: new insights on the evidence </a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2014/01/sally-clark-revisited-another-key.html">Sally Clark revisited: another key statistical oversight?</a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2012/01/prosecutor-fallacy-in-stephen-lawrence.html">Prosecutor fallacy in Stephen Lawrence case? </a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2012/06/prosecutor-fallacy-again-in-media.html">Prosecutor fallacy in media reporting of Burgess DNA case</a> </li><li><a href="http://probabilityandlaw.blogspot.co.uk/2013/07/flaky-dna-prosecutors-fallacy-yet-again.html">Flaky DNA: Prosecutors fallacy yet again</a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2012/06/prosecutors-fallacy-just-will-not-go.html">Prosecutors fallacy just will not go away</a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2016/01/misleading-dna-evidence-and-current.html">Misleading DNA evidence </a></li></ul>Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com2tag:blogger.com,1999:blog-6468094748577058716.post-11994434743323427772017-08-11T09:02:00.001-07:002017-08-11T09:06:58.277-07:00Automatically generating Bayesian networks in analysis of linked crimes<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-tBaNPpatbKk/WY3M5CPYQ7I/AAAAAAAAAiE/B3rJFMRVr2MLKQJJWLS-WX37IlYlqeiNgCK4BGAYYCw/s1600/crime_linkage.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="287" src="https://4.bp.blogspot.com/-tBaNPpatbKk/WY3M5CPYQ7I/AAAAAAAAAiE/B3rJFMRVr2MLKQJJWLS-WX37IlYlqeiNgCK4BGAYYCw/s400/crime_linkage.jpg" width="400" /></a></div><br /><br />Constructing an effective and complete Bayesian network (BN) for individual cases that involve multiple related pieces of evidence and hypotheses requires a major investment of effort. Hence, <a href="http://dx.doi.org/10.1111/cogs.12004">generic BNs have been developed</a> for common situations that only require adapting the underlying probabilities. These so called `idioms’ make it practically possible to build and use BNs in casework without spending unacceptable amounts of time constructing the network. However, in some situations both the probability tables and the structure of the network depend on case specific details. <br /><br />Examples of such situations are where there are multiple linked crimes. In (<a href="https://doi.org/10.1016/j.scijus.2014.11.005">deZoete2015</a>) a BN structure was produced for evaluating evidence in cases where a person is suspected of being the offender in multiple possibly linked crimes. In (<a href="https://doi.org/10.1016/j.scijus.2017.01.003">deZoete2017</a>) this work has been expanded to cover situations with multiple offenders for possibly linked crimes. Although the papers present a methodology of constructing such BNs, the workload associated with constructing them together with the possibility of making mistakes in conditional probability tables, still present unnecessary difficulties for potential users. <br /><br />As part of the <a href="http://bayes-knowledge.com/">BAYES KNOWLEDGE</a> project, we have developed online accessible GUIs that allow the user to select the parameters that reflect their crime linkage situation (both for one and double offender crime linkage cases). The associated BN is then automatically generated according to the structures described in (deZoete2015) and (deZoete2017). It is presented visually in the GUI and is available as download for the user as a .net file which can be opened in <a href="http://www.agenarisk.com/">AgenaRisk</a> or another BN software package. These applications both serve as a tool for those interested or working with crime linkage problems and as a proof of principle of the added value of such GUIs to make BNs accessible by removing the effort of constructing every network from scratch. <br /><br />The GUIs are available from the <a href="http://bayes-knowledge.com/">`DEMO’ tab </a>on the <a href="http://bayes-knowledge.com/">BAYES KNOWLEDGE</a> website and is based on R code, a statistical programming language. This automated workflow can reduce the workload for, in this case, forensic statisticians and increase the mutual understanding between researchers and legal professionals.<br /><br />Jacob deZoete will be presenting this work at the <a href="http://www.cvent.com/events/icfis-2017-international-conference-on-forensic-inference-and-statistics/event-summary-6d357a9583224144866d64f44de367a2.aspx">10th International Conference on Forensic Inference and Statistics (ICFIS 2017)</a> in Minneapolis, September 2017.<br /><br /><br />Links<br /><br /><ul><li><a href="http://bayes-knowledge.com/index.php/2015-06-23-01-41-28/generating-models/crime-linkage-one-offender">The working demo for single offender multiple crimes</a></li><li><a href="http://bayes-knowledge.com/index.php/2015-06-23-01-41-28/generating-models/crime-linkage-two-offenders">The working demo for two offenders and linked crimes</a></li><li>de Zoete, J, Sjerps, M, Lagnado,D, Fenton, N.E. (2015), "Modelling crime linkage with Bayesian Networks" Law, Science & Justice, 55(3), 209-217. <a href="https://www.blogger.com/null">http://doi:10.1016/j.scijus.2014.11.005</a> Pre-publication draft <a href="https://www.eecs.qmul.ac.uk/%7Enorman/papers/Modelling_crime_linkage_with_Bayesian_Networks_Final.pdf">here</a>. </li><li>de Zoete, J, Sjerps, M, Evaluating evidence in linked crimes with multiple offenders. Science & Justice, 57(3): pp 228-238. <a href="https://doi.org/10.1016/j.scijus.2017.01.003">https://doi.org/10.1016/j.scijus.2017.01.003</a></li></ul>Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com2tag:blogger.com,1999:blog-6468094748577058716.post-77249310735462026142017-06-29T16:02:00.003-07:002017-06-29T16:02:18.811-07:00Queen Mary researchers evaluate impact of new regulations on the Buy-To-Let property market using novel AI methods<div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-kXJlmQOx0SI/WVVjMVe5tFI/AAAAAAAAAhk/F1BCT8dgLYgFgSJPF_NPHvJypwX6ysaTACLcBGAs/s1600/property2.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="436" data-original-width="726" height="240" src="https://2.bp.blogspot.com/-kXJlmQOx0SI/WVVjMVe5tFI/AAAAAAAAAhk/F1BCT8dgLYgFgSJPF_NPHvJypwX6ysaTACLcBGAs/s400/property2.jpg" width="400" /></a></div>In 2015 the British government announced major tax reforms for individual landlords that will be in full effect in tax year 2020/21, being introduced gradually after April 2017. The new reforms and regulations have received much media attention as there has been widespread belief that they were sufficiently skewed against landlords that they could signal the end of the Buy-To-Let (BTL) investment era in the UK.<br /><br />Research by <a href="http://www.constantinou.info/">Anthony Constantinou</a> and <a href="http://www.eecs.qmul.ac.uk/%7Enorman/">Norman Fenton</a> of Queen Mary University of London, has <a href="https://doi.org/10.1371/journal.pone.0179297">now been published</a> that provides the first comprehensive evaluation of the impact of the reforms on the London BTL property market. The results use a novel model (based on revolutionary new work in an AI method called <a href="http://bayesianrisk.com/">Bayesian networks</a>) that captures multiple uncertainties and allows investors to assess the impact of various factors of interest on their BTL investment, such as changes in interest rates, capital and rental growth. Additionally, the model allows for portfolio risk management through intervention between time steps, such as the effects of different scenarios of re-mortgaging.<br /><br />The results show that, over a 10-year period, the overall return-on-investment (ROI) will be reduced under the new tax measures, but that the ROI remains good assuming a common BTL London profile. However, there are major differences depending on the investor strategy. For example, for risk-averse investors who choose not to expand their portfolio, the reforms are expected to have only a marginal negative impact, with the overall ROI reducing from 301% under the old regulations to 290% under the new (-3.7%), and this loss comes exclusively from a decrease in net profits from rental income (-32.2%). However, the impact on risk-seeking investors who aim to expand their property portfolio through leveraging is much more significant, since the new tax reforms are projected to decrease ROI from 941% to 590% (-37.3%), over the same 10-year period.<br /><br />The impact on net profits also poses substantial risks for loss-making returns excluding capital gains, especially in the case of rising interest rates. While this makes it less desirable or even non-viable for some to continue being a landlord, based on the current status of all factors taken into consideration for simulation, investment prospects are still likely to remain good within a reasonable range of interest rate and capital growth rate variations. Further, the results also indicate that the recent trend of property prices in London increasing faster than rents will not continue for much longer; either capital growth rates will have to decrease, rental growth rates will have to increase, or we shall observe a combination of the two events. <br /><br />The full paper (with open access link): <br /><br />Constantinou, A. C., & Fenton, N. (2017). The future of the London Buy-To-Let property market: Simulation with temporal Bayesian Networks. PLoS ONE, 12(6): e0179297, <a href="https://doi.org/10.1371/journal.pone.0179297">https://doi.org/10.1371/journal.pone.0179297 </a><br /><br /><i><span style="font-size: small;">The research was supported in part by the European Research Council (ERC) through the research project, ERC-2013-AdG339182-BAYES_KNOWLEDGE, while Agena Ltd provided software support. </span></i>Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com1tag:blogger.com,1999:blog-6468094748577058716.post-17430967123148778172017-03-06T06:40:00.001-08:002017-03-06T06:40:21.080-08:00Explaining and predicting football team performance over an entire season<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-aLh2nr3Na4Y/WHaEoQJfhII/AAAAAAAAAgs/XQ7aWSpWHVYXMEBT6aW2N7SYYnH398_QwCLcB/s1600/delle_ali.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="343" src="https://4.bp.blogspot.com/-aLh2nr3Na4Y/WHaEoQJfhII/AAAAAAAAAgs/XQ7aWSpWHVYXMEBT6aW2N7SYYnH398_QwCLcB/s400/delle_ali.jpg" width="400" /></a></div>When I was <a href="http://bayesknowledge.blogspot.co.uk/2016/01/the-statistics-of-climate-change.html">presenting the BBC documentary Climate Changes by Numbers</a> and had to explain the idea of a statistical 'attribution study', I used the analogy of determining which factors most affected the performance of Premiership football teams year on year. Because I had to do it in a hurry I and my colleague Dr Anthony Constantinou did a very crude analysis which focused on a very small number of factors and showed, unsurprisingly, that turnover (i.e. mainly spend on transfer and wages) had the most impact of these. <br /><br />We weren't happy with the quality of the study and decided to undertake a much more comprehensive analysis as part of the <a href="http://bayes-knowledge.com/">BAYES-KNOWLEDGE project</a>. This project is all about improved decision-making and risk assessment using a probabilistic technique called <a href="http://bayesianrisk.com/">Bayesian Networks</a>. In particular, the main objective of the project is to produce useful/accurate predictions and assessments in situations where there is not a lot of data available. In such situations the current fad of 'big data' methods using machine learning techniques do not work; instead we use 'smart-data' - a method that combines the limited data available with expert causal knowledge and real-world ‘facts’. The idea of predicting Premiership teams' long term performance and identifying the key factors explaining changes was a perfect opportunity to both develop and validate the BAYES-KNOWLEDGE method, especially as we had previously done extensive work in predicting individual premiership match results (see links at bottom).<br /><br />The results of the study have <a href="http://constantinou.info/downloads/papers/smartDataFootball.pdf">now been published</a> in one of the premier international AI journals <i>Knowledge Based Systems</i>. <br /><br />The Bayesian Network model in the paper enables us to predict, before a season starts, the total league points a team is expected to accumulate throughout the season (each team plays 38 games in a season with three points per win and one per draw). The model results compare very favourably against a number of other relevant and different types of models, including some which use far more data. As hoped for the results also provide a novel and comprehensive attribution study of the factors most affecting performance (measured in terms of impact on actual points gained/lost per season). For example, although unsurprisingly, the largest improvements in performance result from massive increases in spending on new players (an 8.49 points gain), an even greater decrease (up to 16.52 points) results from involvement in the European competitions (especially the Europa League) for teams that have previous little experience in such competitions. Also, something that was very surprising and that possibly confounds bookies - and gives punters good potential for exploiting - is that promoted teams generate (on average) a staggering increase in performance of 8.34 points, relative to the relegated team they are replacing. The results in the study also partly address/explain the widely accepted 'favourite-longshot bias' observed in bookies odds.<br /><br />The full reference citation is:<br /><blockquote class="tr_bq">Constantinou, A. C. and Fenton, N. (2017). Towards Smart-Data: Improving predictive accuracy in long-term football team performance. Knowledge-Based Systems, In Press, 2017,<a class="S_C_ddDoi" href="http://dx.doi.org/10.1016/j.knosys.2017.03.005" id="ddDoi" target="doilink"> http://dx.doi.org/10.1016/j.knosys.2017.03.005</a></blockquote>The pre-print version of the paper (pdf) can be found at <a class="moz-txt-link-freetext" href="http://constantinou.info/downloads/papers/smartDataFootball.pdf">http://constantinou.info/downloads/papers/smartDataFootball.pdf</a><br /><br />We acknowledge the financial support by the European Research Council (ERC) for funding research project, <a href="http://bayes-knowledge.com/">ERC-2013-AdG339182-BAYES_KNOWLEDGE</a>, and <a href="http://www.agenarisk.com/">Agena Ltd</a> for software support. <br /><br />See also:<br /><ul><li><a href="http://probabilityandlaw.blogspot.co.uk/2013/08/the-problem-with-predicting-football.html">The problem with predicting football results</a></li><li><a href="http://bayesknowledge.blogspot.co.uk/2016/03/a-bayesian-network-to-determine-optimal.html">A Bayesian network to determine optimal strategy for Spurs' success</a></li><li><a href="http://bayesknowledge.blogspot.co.uk/2016/01/proving-referee-bias-with-bayesian.html">Proving referee bias with Bayesian networks </a></li><li><a href="http://bayesknowledge.blogspot.co.uk/2016/01/the-statistics-of-climate-change.html">The statistics of climate change</a></li><li><a href="http://www.pi-football.com/">pi-football: Anthony Constantinou's website that provides weekly free English Premier League match predictions </a></li><li><a href="http://bayesianrisk.com/">Bayesian networks</a></li></ul>Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com4tag:blogger.com,1999:blog-6468094748577058716.post-77938421171878056232017-02-09T03:18:00.003-08:002017-02-09T03:20:15.326-08:00Helping US Intelligence Analysts using Bayesian networks <br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-pEg0a_tMS-I/WJuizU4EtWI/AAAAAAAAAhE/Pqisc9ih3844ZfwrDwfOOlY6IL_TQnmugCLcB/s1600/bard.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="https://1.bp.blogspot.com/-pEg0a_tMS-I/WJuizU4EtWI/AAAAAAAAAhE/Pqisc9ih3844ZfwrDwfOOlY6IL_TQnmugCLcB/s320/bard.jpg" width="320" /></a></div>Causal Bayesian networks are at the heart of <a href="https://www.monash.edu/news/articles/monash-university-wins-aud-$18m-funding-to-advance-intelligence-analysis">a major new collaborative research project led by Australian University Monash</a> - funded by the United States' Intelligence Advanced Research Projects Activity (IARPA). The objective is to help intelligence analysts assess the value of their information. IARPA was set up following the failure of the US intelligence agencies to properly assess the correct levels of threat posed by Al Qaeda in 2001 and Iraq in 2003.<br /><br />The chief investigator at Monash, Kevin Korb, said in an interview in <a href="http://online.isentialink.com/theaustralian.com.au/2017/02/06/ce10a639-0110-4755-835a-2572b0bc2c78.html">the Australian</a>: <br /><blockquote class="tr_bq"><div class="selectionShareable">"..quantitative rather than qualitative methods were crucial in judging the value of intelligence.... more quantitative approaches could have helped contain the ebola epidemic by making authorities appreciate the scale of the problem months earlier. They could also build a better assessment of the likelihood of events like gunfire between vessels in the South China Sea, a substantial devaluation of the Venezuelan currency or a new presidential aspirant in Egypt."</div></blockquote>Norman Fenton and Martin Neil (both of <a href="http://www.agenarisk.com/">Agena</a> and <a href="http://www.eecs.qmul.ac.uk/%7Enorman/">Queen Mary University of London</a>) will be working on the project along with colleagues such as David Lagnado and Ulrike Hahn at UCL. <a href="http://www.agenarisk.com/products/desktop.shtml">AgenaRisk</a> will be used throughout the project as the Bayesian network platform. <br /><br />Further information:<br /><ul><li><a href="http://online.isentialink.com/theaustralian.com.au/2017/02/06/ce10a639-0110-4755-835a-2572b0bc2c78.html">Australian) Melbourne unis awarded rare contracts by US intelligence’s research arm</a></li><li><a href="https://www.monash.edu/news/articles/monash-university-wins-aud-$18m-funding-to-advance-intelligence-analysis">Monash University wins up to A$18m funding to advance intelligence analysis</a> </li><li><a href="http://www.monash.edu/it/our-research/showcase-projects/bard-bayesian-argumentation-via-delphi">BARD: Bayesian ARgumentation via Delphi</a> </li></ul>Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com38tag:blogger.com,1999:blog-6468094748577058716.post-52406864599993695002017-02-08T10:43:00.000-08:002017-02-09T04:55:30.719-08:00Queen Mary in new £2 million project using Bayesian networks to create intelligent medical decision support systems with real-time monitoring for chronic conditions <div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-Lek54xidkPo/WBt0yW8XlaI/AAAAAAAAAfA/VgrZzfZMAAQja1wgGUgqrBJ0ywUEHhGGwCLcB/s1600/PAMBAYESIAN_logo.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="306" src="https://2.bp.blogspot.com/-Lek54xidkPo/WBt0yW8XlaI/AAAAAAAAAfA/VgrZzfZMAAQja1wgGUgqrBJ0ywUEHhGGwCLcB/s400/PAMBAYESIAN_logo.jpg" width="400" /></a></div><br />UPDATE 9 Feb 2017: Various Research Fellowship and PhD vacancies funded by this project are now advertised. See <a href="http://www.eecs.qmul.ac.uk/%7Enorman/projects/PAMBAYESIAN/jobs_PAMBAYESIAN.html">here</a>.<br /><br />Queen Mary has been awarded a grant of £1,538,497 (Full economic cost £1,923,122) from the EPSRC towards a major new collaborative project to develop a new generation of intelligent medical decision support systems. The project, called PAMBAYESIAN (Patient Managed Decision-Support using Bayesian Networks) focuses on home-based and wearable real-time monitoring systems for chronic conditions including rheumatoid arthritis, diabetes in pregnancy and atrial fibrillation. It has the potential to improve the well-being of millions of people. <br /><br />The project team includes researchers from both the School of Electronic Engineering and Computer Science (EECS) and clinical academics from the Barts and the London School of Medicine and Dentistry (SMD). The collaboration is underpinned by extensive research in EECS and SMD, with access to digital health firms that have extensive experience developing patient engagement tools for clinical development (BeMoreDigital, Mediwise, Rescon, SMART Medical, uMotif, IBM UK and Hasiba Medical). <br /><br />The project is led by <b>Prof Norman Fenton</b> with co-investigators: Dr William Marsh, Prof Paul Curzon, Prof Martin Neil, Dr Akram Alomainy (all EECS) and Dr Dylan Morrissey, Dr David Collier, Professor Graham Hitman, Professor Anita Patel, Dr Frances Humby, Dr Mohammed Huda, Dr Victoria Tzortziou Brown (all SMD). The project will also include four QMUL-funded PhD students.<br /><br />The three-year project will begin June 2017.<br /><br /><b>Background </b><br /><br />Patients with chronic diseases must take day-to-day decisions about their care and rely on advice from medical staff to do this. However, regular appointments with doctors or nurses are expensive, inconvenient and not necessarily scheduled when needed. Increasingly, we are seeing the use of low cost and highly portable sensors that can measure a wide range of physiological values. Such 'wearable' sensors could improve the way chronic conditions are managed. Patients could have more control over their own care if they wished; doctors and nurses could monitor their patients without the expense and inconvenience of visits, except when they are needed. Remote monitoring of patients is already in use for some conditions but there are barriers to its wider use: it relies too much on clinical staff to interpret the sensor readings; patients, confused by the information presented, may become more dependent on health professionals; remote sensor use may then lead to an increase in medical assistance, rather than reduction. <br /><br />The project seeks to overcome these barriers by addressing two key weaknesses of the current systems: <br /><ol><li>Their lack of intelligence. Intelligent systems that can help medical staff in making decisions already exist and can be used for diagnosis, prognosis and advice on treatments. One especially important form of these systems uses belief or Bayesian networks, which show how the relevant factors are related and allow beliefs, such as the presence of a medical condition, to be updated from the available evidence. However, these intelligent systems do not yet work easily with data coming from sensors. </li><li>Any mismatch between the design of the technical system and the way the people - patients and professional - interact. </li></ol>We will work on these two weaknesses together: patients and medical staff will be involved from the start, enabling us to understand what information is needed by each player and how to use the intelligent reasoning to provide it. <br /><br />The medical work will be centred on three case studies, looking at the management of rheumatoid arthritis, diabetes in pregnancy and atrial fibrillation (irregular heartbeat). These have been chosen both because they are important chronic diseases and because they are investigated by significant research groups in our Medical School, who are partners in the project. This makes them ideal test beds for the technical developments needed to realise our vision and allow patients more autonomy in practice. <br /><br />To advance the technology, we will design ways to create belief networks for the different intelligent reasoning tasks, derived from an overall model of medical knowledge relevant to the diseases being managed. Then we will investigate how to run the necessary algorithms on the small computers attached to the sensors that gather the data as well as on the systems used by the healthcare team. Finally, we will use the case studies to learn how the technical systems can integrate smoothly into the interactions between patients and health professionals, ensuring that information presented to patients is understandable, useful and reduces demands on the care system while at the same time providing the clinical team with the information they need to ensure that patients are safe. <br /><br /><b>Further information: <a href="http://www.eecs.qmul.ac.uk/%7Enorman/projects/PAMBAYESIAN/">www.eecs.qmul.ac.uk/~norman/projects/PAMBAYESIAN/</a></b><br /><br />This project also complements another Bayesian networks based project - the Leverhulme-funded project "<i>CAUSAL-DYNAMICS (Improved Understanding of Causal Models in Dynamic Decision Making)</i>" - starting January 2017. See <a href="http://www.eecs.qmul.ac.uk/%7Enorman/projects/leverhulme/causal_dynamics.html">CAUSAL-DYNAMICS</a>Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com1tag:blogger.com,1999:blog-6468094748577058716.post-36640979861793053642017-01-01T13:49:00.002-08:002017-08-14T13:35:42.724-07:00The problem with the likelihood ratio for DNA mixture profiles<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-7SSeXIOK6WM/WGlwxAdn5zI/AAAAAAAAAgc/vg-etp30XqAdi2h_lcFuKhUEZDpq61GaACLcB/s1600/dna_mixture.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="222" src="https://1.bp.blogspot.com/-7SSeXIOK6WM/WGlwxAdn5zI/AAAAAAAAAgc/vg-etp30XqAdi2h_lcFuKhUEZDpq61GaACLcB/s400/dna_mixture.jpg" width="400" /></a></div><br />We have written many times before (see the links below) about use of the Likelihood Ratio (LR) in legal and forensic analysis. <br /><br />To recap: the LR is a very good and simple method for determining the extent to which some evidence (such as DNA found at the crime scene matching the defendant) supports one hypothesis (such as "defendant is the source of the DNA") over an alternative hypothesis (such as "defendant is not the source of the DNA"). The previous articles discussed the various problems and misinterpretations surrounding the use of the LR. Many of these arise when the hypotheses are not mutually exclusive and exhaustive. This problem is especially pertinent in the case of 'DNA mixture' evidence, i.e. when some DNA sample relevant to a case comes from more than one person. With modern DNA testing techniques it is common to find DNA samples with multiple (but unknown number of) contributors. In such cases there is no obvious 'pair' of hypotheses that are mutually exclusive and exhaustive, since we have individual hypotheses such as:<br /><ul><li>H1: suspect + one unknown</li><li>H2: suspect + one known other </li><li>H3: two unknowns</li><li>H4: suspect + two unknowns </li><li>H5: suspect + one known other + one unknown</li><li>H6: suspect + two known others</li><li>H7: three unknowns </li><li>H8: one known other + two unknowns</li><li>H9: two known others + one unknown</li><li>H10: three known others</li><li>H11: suspect + three unknowns </li><li>etc.</li></ul>It is typical in such situations to focus on the 'most likely' number of contributors (say <i>n</i>) and then compare the hypothesis "suspect + (<i>n</i>-1) unknowns" with the hypothesis "n unknowns". For example, if there are likely to be 3 contributors then typically the following hypotheses are compared:<br /><ul><li>H1: suspect + two unknowns </li><li>H2: three unknowns </li></ul>Now, to compute the LR we have to compute the likelihood of the particular DNA trace evidence E under each of the hypotheses. Generally both of these are extremely small numbers, i.e. both the probability values P(E | H1) and P( E | H2) are very small numbers. For example, we might get something like<br /><ul><li>P(E | H1) = 0.00000000000000000001 (10 to the minus 20) </li><li>P(E | H2) = 0.00000000000000000000000001 (10 to the minus 26) </li></ul>For a statistician, the size of these numbers does not matter – we are only interested in the ratio (that is precisely what the LR is) and in the above example the LR is very large (one million) meaning that the evidence is a million times more likely to have been observed if H1 is true compared to H2. This seems to be overwhelming evidence that the suspect was a contributor. Case closed? <br /><br />Apart from the <i><b>communication problem</b></i> in court of getting across what this all means (defence lawyers can and do exploit the very low probability of E given H1) and how it is computed, there is an <i><b>underlying statistical problem</b></i> with small likelihoods for non-exhaustive hypotheses and I will highlight the problem with two scenarios involving a simple urn example. Superficially, the scenarios seem identical. The first scenario causes no problem but the second one does. The concern is that it is not at all obvious that the DNA mixture problem always corresponds more closely to the first scenario than the second. <br /><br />In both scenarios we assume the following: <br /><br />There is an urn with 1000 balls – some of which are white. Suppose W is the (unknown) number of white balls. We have 2 hypotheses: <br /><ul><li>H1: W=100</li><li>H2: W=90 </li></ul>We can draw a ball as many times as we like, note its colour and replace it (i.e. sample with replacement). We wish to use the evidence of 10,000 such samples. <br /><br />Scenario 1: We draw 1001 white balls. In this case using standard statistical assumptions we calculate P(E | H1) = 0.013, P(E|H2) = 0.0000036. Both values are small but the LR is large, 3611, strongly favouring H1 over H2. <br /><br />Scenario 2: We draw 1100 white balls. In this case P(E | H1) = 0.000057, P(E|H2) < 0.00000001. Again both values are very small but the LR is very large, strongly favouring of H1 over H2. <br /><br />(note: in both cases we could have chosen a much larger sample and got truly tiny likelihoods but these values are sufficient to make the point). <br /><br />So in what sense are these two scenarios fundamentally different and why is there a problem? <br /><br />In scenario 1 not only does the conclusion favouring H1 make sense, but the actual number of balls drawn is very close to the <i><b>expected </b></i>number we would get if H1 were true (in fact, W=100 is the 'maximum likelihood estimate' for number of balls). So not only does the evidence point to H1 over H2, but also to H1 over any other hypothesis (and there are 1000 different hypotheses W=0, W=1, W=2 etc.). <br /><br />In scenario 2 the evidence is actually even much more supportive of H1 over H2 than in scenario 1. <i><b>But it is essentially meaningless because it is virtually certain that BOTH hypotheses are false</b></i>. <br /><br />So, returning to the DNA mixture example, it is certainly not sufficient to compare just two hypotheses. The LR of one million in favour of H1 over H2 may be hiding the fact that neither of these hypotheses is true. It is far better to identify as exhaustive a set of hypotheses as is realistically possible and then determine the individual likelihood value of each hypothesis. We can then identify the hypothesis with the highest likelihood value and consider its LR compared to each of the other hypotheses.<br /><ul><li><a href="http://bayesknowledge.blogspot.co.uk/2016/11/confusion-over-likelihood-ratio.html">Confusion over the Likelihood ratio</a></li><li><a href="http://bayesknowledge.blogspot.co.uk/2016/02/problems-with-likelihood-ratio-method.html">Problems with the Likelihood Ratio method for determining probative value of evidence: the need for exhaustive hypotheses</a></li><li><a href="http://bayesknowledge.blogspot.co.uk/2016/01/misleading-dna-evidence-and-current.html">Misleading DNA evidence</a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2013/09/barry-george-case-new-insights-on.html">Barry George case: new insights on the evidence </a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2014/01/sally-clark-revisited-another-key.html">Sally Clark revisited: another key statistical oversight?</a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2012/01/prosecutor-fallacy-in-stephen-lawrence.html">Prosecutor fallacy in Stephen Lawrence case? </a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2012/06/prosecutor-fallacy-again-in-media.html">Prosecutor fallacy in media reporting of Burgess DNA case</a> </li><li><a href="http://probabilityandlaw.blogspot.co.uk/2013/07/flaky-dna-prosecutors-fallacy-yet-again.html">Flaky DNA: Prosecutors fallacy yet again</a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2012/06/prosecutors-fallacy-just-will-not-go.html">Prosecutors fallacy just will not go away</a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2016/01/misleading-dna-evidence-and-current.html">Misleading DNA evidence </a></li></ul>Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com0tag:blogger.com,1999:blog-6468094748577058716.post-58879221721483656082016-11-08T14:06:00.003-08:002016-11-08T14:19:13.141-08:00Confusion over the Likelihood Ratio<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-M1z1hhI1thM/WCJLn89lIgI/AAAAAAAAAfQ/1YiEaVBKnVwVPvvtD-QxH2H-wD_Vh0idACLcB/s1600/LR_newton.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="180" src="https://1.bp.blogspot.com/-M1z1hhI1thM/WCJLn89lIgI/AAAAAAAAAfQ/1YiEaVBKnVwVPvvtD-QxH2H-wD_Vh0idACLcB/s400/LR_newton.jpg" width="400" /></a></div><br />The 'Likelihood Ratio' (LR) has been dominating discussions at the <a href="https://www.newton.ac.uk/event/fosw03">third workshop</a> in our Isaac Newton Institute Cambridge Programme <a href="https://www.newton.ac.uk/event/fos">Probability and Statistics in Forensic Science.</a><br />There have been many fine talks on the subject - and these talks will be available <a href="https://www.newton.ac.uk/event/fosw03/timetable">here</a> for those not fortunate enough to be attending.<br /><br />We have written before (see links at bottom) about some concerns with the use of the LR. For example, we feel there is often a desire to produce a single LR even when there are multiple different unknown hypotheses and dependent pieces of evidence (in such cases we fell the problem needs to be modelled as a Bayesian network)- see [1]. Based on the extensive discussions this week, I think it is worth recapping on another one of these concerns (namely when hypotheses are non-exhaustive).<br /><br />To recap: The LR is a formula/method that is recommended for use by forensic scientists when presenting evidence - such as the fact that DNA collected at a crime scene is found to have a profile that matches the DNA profile of a defendant in a case. In general, the LR can a very good and simple method for communicating the impact of evidence (in this case on the hypothesis that the defendant is the source of the DNA found at the crime scene).<br /><br />To compute the LR, the forensic expert is forced to consider the probability of finding the evidence under <i><b>both</b></i> the prosecution and defence hypotheses. So, if the prosecution hypothesis Hp is "Defendant is the source of the DNA found" and the defence hypothesis Hp is "Defendant <i><b>is not </b></i>the source of the DNA found" then we compute both the probability of the evidence given Hp - written P(E | Hp) - and the probability of the evidence given Hd - written P(E | Hd). The LR is simply the ratio of these two likelihoods, i.e. P(E | Hp) divided by P(E | Hd).<br /><br />The very act of considering both likelihood values is a good thing to do because it helps to avoid common errors of communication that can mislead lawyers and juries (notably the <a href="http://www.agenarisk.com/resources/probability_puzzles/prosecutor.shtml">prosecutor's fallacy</a>). But, most importantly, the LR is a measure of the probative value of the evidence. However, this notion of probative value is where misunderstandings and confusion sometimes arise. In the case where the defence hypothesis is the negation of the prosecution hypothesis (i.e. Hd is the same as "not Hp" as in our example above) things are clear and very powerful because, by Bayes theorem:<br /><ul><li>when the LR is greater than one the evidence supports the prosecution hypothesis (increasingly for larger values) - in fact the posterior odds of the prosecution hypothesis increase by a factor of LR over the prior odds.</li><li>when the LR is less than one it supports the defence hypothesis (increasingly as the LR gets closer to zero) - the posterior odds of the defence hypothesis increase by a factor of LR over the prior odds.</li><li>when the LR is equal to one then the evidence supports neither hypothesis and so is 'neutral' - the posterior odds of both hypotheses are unchanged from their prior odds. In such cases, since the evidence has no probative value lawyers and forensic experts believe it should not be admissible. </li></ul>However, things are by no means as clear and powerful when the hypotheses are not exhaustive (i.e. the negation of each other) and in most forensic applications this is the case. For example, in the case of DNA evidence, while the prosecution hypothesis Hp is still "defendant is source of the DNA found" in practice the defence hypothesis Hd is often something like "a person unrelated to the defendant is the source of the DNA found".<br /><br />In such circumstances the LR can <i><b>only </b></i>help us to distinguish between which of the two hypotheses is more likely, so, e.g. when the LR is greater than one the evidence supports the prosecution hypothesis over the defence hypothesis (with larger values leading to increased support). Unlike the case for exhaustive hypotheses <b>the LR tells us nothing about the posterior odds of the prosecution hypothesis</b>. In fact, it is quite possible that the LR can be very large - i.e. strongly supporting the prosecution hypothesis over the defence hypothesis - <i><b>even though the posterior probability of the prosecution hypothesis goes down</b></i>. This rather worrying point is not understood by all forensic scientists (or indeed by all statisticians). Consider the following example (it's a made-up coin tossing example, but has the advantage that the numbers are indisputable):<br /><blockquote class="tr_bq">Fred claims to be able to toss a fair coin in such a way that about 90% of the time it comes up Heads. So the main hypothesis is </blockquote><blockquote class="tr_bq"> H1: Fred has genuine skill </blockquote><blockquote class="tr_bq">To test the hypothesis, we observe him toss a coin 10 times. It comes out Heads each time. So our evidence E is 10 out of 10 Heads. Our alternative hypothesis is:</blockquote><blockquote class="tr_bq"> H2: Fred is just lucky. <br /><br />By Binomial theorem assumptions, P(E | H1) is about 0.35 while P(E | H2) is about 0.001. So the LR is about 350, strongly in favour of H1.<br /><br />However, the problem here is that H1 and H2 are not exhaustive. There could be another hypotheses H3: "Fred is cheating by using a double-headed coin". Now, P(E | H3) = 1. <br /><br />If we assume that H1, H2 and H3 are the only possible hypotheses* (i.e. they are exhaustive) and that the priors are equally likely, i.e. each is equal to 1/3 then the posteriors after observing the evidence E are: <br /><br />H1: 0.25907 H2: 0.00074 H3: 0.74019 <br /><br />So, after observing the evidence E, the posterior for H1 has actually <i><b>decreased </b></i>despite the very large LR in its favour over H2. </blockquote>In the above example, a good forensic scientist - if considering only H1 and H2 - would conclude by saying something like<br /><blockquote class="tr_bq"><i>"The evidence shows that hypothesis H1 is 350 times more likely than H2, but tells us nothing about whether we should have greater belief in H1 being true; indeed, it is possible that the evidence may much more strongly support some other hypothesis not considered and even make our belief in H1 decrease". </i></blockquote>However, in practice (and I can confirm this from having read numerous DNA reports) no such careful statement is made. In fact, the most common assertion used in such circumstances is:<br /><blockquote class="tr_bq"> <i>"The evidence provides strong support for hypothesis H1" </i></blockquote>Such an assertion is not only mathematically wrong but highly misleading. Consider, as discussed above, a DNA case where:<br /><br /> Hp is "defendant is source of the DNA found"<br /> Hd is "a person unrelated to the defendant is the source of the DNA found". <br /><br />This particular Hd hypothesis is a common convenient choice for the simple reason that P(E | Hd) is relatively easy to compute (it is the 'random match probability'). For single-source, high quality DNA this probability can be extremely small - of the order of one over several billions; since P(E | Hp) is equal to 1 in this case the LR is several billions. But, this does NOT provide overwhelming support for Hp as is often assumed unless we have been able to rule out all relatives of the defendant as suspects. Indeed, for less than perfect DNA samples it is quite possible for the LR to be in the order of millions but for a close relative to be a more likely source than the defendant.<br /><br />While confusion and misunderstandings can and do occur as a result of using hypotheses that are not exhaustive, there are many real examples where the choice of such non-exhaustive hypotheses is actually negligent. The following appalling example is based on a real case (location details changed as an appeal is ongoing):<br /><blockquote class="tr_bq">The suspect is accused of committing a crime in a particular rural location A near his home village in Dorset. The evidence E is soil found on the suspect's car. The prosecution hypothesis Hp is "the soil comes from A". The suspect lives (and drives) near this location but claims he did not drive to that specific spot. To 'test' the prosecution hypothesis a soil expert compares Hp with the hypothesis Hd: "the soil comes from a different rural location". However, the 'different rural location' B happens to be 500 miles away in Perth Scotland (simply because it is close to where the soil analyst works and he assumes soil from there is 'typical' of rural soil). To carry out the test the expert considers soil profiles of E and samples from the two sites A and B. <br /><br />Inevitably the LR strongly favours Hp (i.e. site A) over Hd (i.e. site B); the soil profile on the car - even if it was never at location A - is going to be much closer to the A profile than the B profile. But we can conclude absolutely nothing about the posterior probability of A. The LR is completely useless - it tells us nothing other than the fact that the car was more likely to have been driven in the rural location in Dorset than in a a rural location in Perth. Since the suspect had never driven the car outside Dorset this is hardly a surprise. Yet, in the case this soil evidence was considered important since it was wrongly assumed to mean that it "provided support for the prosecution hypothesis".</blockquote>This example also illustrates, however, why in practice it can be impossible to consider exhautive hypotheses. For such soil cases, it would require us to consider samples from every possible 'other' location. What an expert like Pat Wiltshire (who is also a participant on the FOS programme) does is to choose alternative sites close to the alleged crime scene and compare the profile of each of those and the crime scene profile with the profile from the suspect. While this does not tell us if the suspect was at the crime scene it can tell us how much more likely the suspect was to have been there rather than sites nearby. <br /><br />*as pointed out by Joe Gastwirth there could be other hypotheses like "Fred uses the double-headed coin but switches to a regular coin after every 9 tosses"<br /><br /><b>References</b><br /><ol><li>Fenton N.E, Neil M, Berger D, “Bayes and the Law”, Annual Review of Statistics and Its Application, Volume 3, 2016 (June), pp 51-77 http://dx.doi.org/10.1146/annurev-statistics-041715-033428 .Pre-publication version <a href="http://www.eecs.qmul.ac.uk/%7Enorman/papers/bayes_and_the_law_revised_FINAL.pdf">here</a> and <a href="https://www.eecs.qmul.ac.uk/%7Enorman/papers/bayes_and_the_law__SUPPLEMENTARY_FINAL.pdf">here</a> is the Supplementary Material See also <a href="http://bayesknowledge.blogspot.co.uk/2016/05/using-bayesian-networks-to-assess-new.html">blog posting.</a></li><li>Fenton, N. E., D. Berger, D. Lagnado, M. Neil and A. Hsu, (2013). "When ‘neutral’ evidence still has probative value (with implications from the Barry George Case)", Science and Justice, http://dx.doi.org/10.1016/j.scijus.2013.07.002. A pre-publication version of the article can be found <a href="https://www.researchgate.net/publication/263739610_When_%27neutral%27_evidence_still_has_probative_value_%28with_implications_from_the_Barry_George_Case%29">here</a>.</li></ol><br />See also previous blog postings:<br /><br /><br /><ul><li><a href="http://bayesknowledge.blogspot.co.uk/2016/02/problems-with-likelihood-ratio-method.html">Problems with the Likelihood Ratio method for determining probative value of evidence: the need for exhaustive hypotheses</a></li><li><a href="http://bayesknowledge.blogspot.co.uk/2016/01/misleading-dna-evidence-and-current.html">Misleading DNA evidence</a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2013/09/barry-george-case-new-insights-on.html">Barry George case: new insights on the evidence </a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2014/01/sally-clark-revisited-another-key.html">Sally Clark revisited: another key statistical oversight?</a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2012/01/prosecutor-fallacy-in-stephen-lawrence.html">Prosecutor fallacy in Stephen Lawrence case? </a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2012/06/prosecutor-fallacy-again-in-media.html">Prosecutor fallacy in media reporting of Burgess DNA case</a> </li><li><a href="http://probabilityandlaw.blogspot.co.uk/2013/07/flaky-dna-prosecutors-fallacy-yet-again.html">Flaky DNA: Prosecutors fallacy yet again</a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2012/06/prosecutors-fallacy-just-will-not-go.html">Prosecutors fallacy just will not go away</a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2016/01/misleading-dna-evidence-and-current.html">Misleading DNA evidence </a></li></ul><br />Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com0tag:blogger.com,1999:blog-6468094748577058716.post-73981443816625791522016-10-07T06:15:00.001-07:002016-10-07T06:15:20.120-07:00Bayesian Networks and Argumentation in Evidence Analysis<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-yLIlXkLNEso/V_eb-fHe5hI/AAAAAAAAAeg/yPzZrD16Ygc1vtrIdcwFltmyDq_RLXUcwCLcB/s1600/16-9-23%2BFOSW02%2Bworkshop.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" height="248" src="https://1.bp.blogspot.com/-yLIlXkLNEso/V_eb-fHe5hI/AAAAAAAAAeg/yPzZrD16Ygc1vtrIdcwFltmyDq_RLXUcwCLcB/s640/16-9-23%2BFOSW02%2Bworkshop.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Some of the workshop participants</td></tr></tbody></table>On 26-29 September 2016 a workshop on "<a href="https://www.newton.ac.uk/event/fosw02">Bayesian Networks and Argumentation in Evidence Analysis</a>" took place at the Isaac Newton Institute Cambridge. This workshop, which was part of the <a href="https://www.newton.ac.uk/event/fos">FOS Programme</a> was also the first public workshop of the ERC-funded project Bayes-Knowledge (<a href="http://bayes-knowledge.com/">ERC-2013-AdG339182-BAYES_KNOWLEDGE</a>).<br /><br />The workshop was a tremendous success, attracting many of the world's leading scholars in the use of Bayesian networks in law and forensics. Most of the presentations were filmed and can now be viewed <a href="https://www.newton.ac.uk/event/fosw02/timetable">here</a>.<br /><br />There was also a pre-workshop meeting on 23-24 September where participants focused on an important Dutch case that recently went to appeal. The partcipants were divided into two groups - one group developed a BN model of the case and the other developed an agumentation/scenarios-based model of the case. We plan to further develop these and write up the results.<br /><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-o7a6h-p9zCY/V_ecTREDPNI/AAAAAAAAAek/6atWNBOKVD4tnfxc_Gc4cnUspnRf02XGgCLcB/s1600/16-9-23%2BFOS%2Bprogramme%2B131_2.JPG" style="margin-left: auto; margin-right: auto;"><img border="0" height="428" src="https://1.bp.blogspot.com/-o7a6h-p9zCY/V_ecTREDPNI/AAAAAAAAAek/6atWNBOKVD4tnfxc_Gc4cnUspnRf02XGgCLcB/s640/16-9-23%2BFOS%2Bprogramme%2B131_2.JPG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Some of the participants at the pre-workshop meeting anyalysing a specific Dutch case</td><td class="tr-caption" style="text-align: center;"><br /></td><td class="tr-caption" style="text-align: center;"><br /></td></tr></tbody></table><br />Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com0tag:blogger.com,1999:blog-6468094748577058716.post-18678007312651154262016-10-07T02:50:00.000-07:002016-10-07T02:50:14.940-07:00The Bayesian Networks mutual exclusivity problem<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-o8fImJwrntg/V_dpgJkupwI/AAAAAAAAAeQ/4NqyLTfjeeQ769PXLk_KeI_q2d_T5iJnwCLcB/s1600/mutual.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="242" src="https://1.bp.blogspot.com/-o8fImJwrntg/V_dpgJkupwI/AAAAAAAAAeQ/4NqyLTfjeeQ769PXLk_KeI_q2d_T5iJnwCLcB/s320/mutual.jpg" width="320" /></a></div>Several years ago when we started serious modelling of legal arguments using Bayesian networks we hit a problem that we felt would be easily solved. We had a set of mutually exclusive events such as "X murdered Y, Z murdered Y, Y was not murdered" that we needed to model as separate variables because they had separate causal pathways and evidence.<br /><br />It turned out that existing BN modelling techniques cannot capture the correct intuitive reasoning when a set of mutually exclusive events need to be modelled as separate nodes instead of states of a single node. The standard proposed ’solution’, which introduces a simple constraint node that enforces mutual exclusivity, fails to preserve the prior probabilities of the events and is therefore flawed.<br /><br />In 2012 myself (and the co-authors listed below) produced an initial novel and simple solution to this problem that works in a reasonable set of circumstances, but it proved to be difficult to get people to understand why the problem was an important one that needed to be solved. After many changes and iterations this work has finally been published and, as a 'gold access paper' it is free for anybody to download in full (see link below).<br /><br />During the current Programme "<a href="https://www.newton.ac.uk/event/fos">Probability and Statistics in Forensic Science</a>" that I am helping to run at the Isaac Newton Institute for Mathematical Sciences, Cambridge, 18 July - 21 Dec 2016, it has become clear that the mutual exclusivity problem is critical in any legal case where there are diverse prosecution and defence narratives. Although our solution does not work in all cases (and indeed we are working on more comprehsive approaches) we feel it is an important start.<br /><br /><blockquote class="tr_bq"><b>Norman Fenton, Martin Neil, David Lagnado, William Marsh, Barbaros Yet, Anthony Constantinou,</b> "How to model mutually exclusive events based on independent causal pathways in Bayesian network models", <i>Knowledge-Based Systems</i>, Available online 17 September 2016<br /><a href="http://dx.doi.org/10.1016/j.knosys.2016.09.012">http://dx.doi.org/10.1016/j.knosys.2016.09.012</a></blockquote>Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com0tag:blogger.com,1999:blog-6468094748577058716.post-63586849042336423012016-09-17T16:47:00.001-07:002016-09-17T17:02:06.723-07:00Bayesian networks: increasingly important in cross disclipinary work<div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-ODhmK7Ce8g4/V93TbQUoa4I/AAAAAAAAAdw/NnBXyMVbuqojkUdyySTPZNfhnYpQPgHwACLcB/s1600/causal_dynamics_logo.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="221" src="https://4.bp.blogspot.com/-ODhmK7Ce8g4/V93TbQUoa4I/AAAAAAAAAdw/NnBXyMVbuqojkUdyySTPZNfhnYpQPgHwACLcB/s400/causal_dynamics_logo.jpg" width="400" /></a></div>The growing importance of Bayesian networks was demonstrated this week by the award of a prestigious Leverhulme Trust Research Project Grant of £385,510 to Queen Mary University of London that ultimately will lead to improved design and use of self-monitoring systems such as blood sugar monitors, home energy smart meters, and self-improvement mobile phone apps.<br /><br />The project, CAUSAL-DYNAMICS ("Improved Understanding of Causal Models in Dynamic Decision-making") is a collaborative project, led by <a href="http://www.eecs.qmul.ac.uk/%7Enorman/">Professor Norman Fenton</a> of the School of Electronic Engineering and Computer Science, with co-investigators <a href="http://www.magdaosman.co.uk/">Dr Magda Osman</a> (School of Biological and Chemical Sciences), <a href="http://www.eecs.qmul.ac.uk/%7Emartin/">Prof Martin Neil</a> (School of Electronic Engineering and Computer Science) and <a href="http://www.ucl.ac.uk/lagnado-lab/david_lagnado.html">Prof David Lagnado</a> (Department of Experimental Psychology, University College London).<br /><br />The project exploits Fenton and Neil's expertise in <a href="http://bayesianrisk.com/">causal modelling using Bayesian networks</a> and Osman and Lagnado's expertise in cognitive decision making. Previously, psychologists have extensively studied dynamic decision-making without formally modelling causality while statisticians, computer scientists, and AI researchers have extensively studied causality without considering its central role in human dynamic decision making. This new project starts with the hypothesis that we can formally model dynamic decision-making from a causal perspective. This enables us to identify both where sub-optimal decisions are made and to recommend what the optimal decision is. The hypothesis will be tested in real world examples of how people make decisions when interacting with dynamic self-monitoring systems such as blood sugar monitors and energy smart meters and will lead to improved understanding and design of such systems.<br /><br />The project is for 3 years starting Jan 2017. For further details, see: <a href="http://www.eecs.qmul.ac.uk/%7Enorman/projects/leverhulme/causal_dynamics.html">CAUSAL-DYNAMICS</a>.<br /><br /><b>WATCH THIS SPACE FOR THE ANNOUNCEMENT VERY SOON OF TWO OTHER MAJOR NEW CROSS-DISCIPLINARY BAYESIAN NETWORK PROJECTS!! </b><br /><br /><i><b>About the Leverhulme Trust</b> </i><br />The Leverhulme Trust was established by the Will of William Hesketh Lever, the founder of Lever Brothers. Since 1925 the Trust has provided grants and scholarships for research and education; today it is one of the largest all-subject providers of research funding in the UK, distributing approximately £80 million a year. For more information: www.leverhulme.ac.uk / @LeverhulmeTrustNorman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com0tag:blogger.com,1999:blog-6468094748577058716.post-69888146682967953782016-09-16T05:43:00.000-07:002016-09-16T05:44:58.521-07:00Bayes and the Law: what's been happening in Cambridge and how you can see it<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-KI6tF2_YdQA/V9vhjnguRpI/AAAAAAAAAdM/nqabpBz6jRM6s9ENAzsakoTtkWHi-IFjgCLcB/s1600/newton_organisers.JPG" style="margin-left: auto; margin-right: auto;"><img border="0" height="242" src="https://1.bp.blogspot.com/-KI6tF2_YdQA/V9vhjnguRpI/AAAAAAAAAdM/nqabpBz6jRM6s9ENAzsakoTtkWHi-IFjgCLcB/s640/newton_organisers.JPG" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Programme Organisers (left to right): Richard Gill, David Lagnado, Leila Schneps, David Balding, Norman Fenton</td></tr></tbody></table>Since 21 July 2016 I have been running the <a href="https://www.newton.ac.uk/event/fos">Isaac Newton Institute (INI) Programme on Probability and Statistics in Forensic Science</a> in Cambridge.<br /><br />For those of you who were not fortunate enough to be at the first formal workshop "The nature of questions arising in court that can be addressed via probability and statistical methods" (30 August to 2 September) you can watch the full videos <a href="https://www.newton.ac.uk/event/fosw01/timetable">here</a> of most of the 35 presentations on the INI website. The presentation slide are also available in the INI link..<br /><br />The workshop attracted many of the world's leading figures from the law, statistics and forensics with a mixture of academics (including mathematicians and legal scholar), forensic practitioners, and practicing lawyers (including judges and eminent QCs). It was rated a great success.<br /><br />The second formal workshop <a href="https://www.newton.ac.uk/event/fosw02">"Bayesian Networks and Argumentation in Evidence Analysis" will take place on 26-29 September</a>. It is also part of the <a href="http://bayes-knowledge.com/">BAYES-KNOWLEDGE project programe of work</a>. For those who wish to attend, but cannot, the workshop will be streamed live.<br /><i><br /></i><i>Norman Fenton, 16 September 2016</i><br /><br />Links<br /><ul><li><a href="https://www.newton.ac.uk/event/fosw01/timetable">Watch the presentations from the workshop "The nature of questions arising in court that can be addressed via probability and statistical methods" from 30 August to 2 September.</a></li><li><a href="https://www.newton.ac.uk/event/fosw02">"Bayesian Networks and Argumentation in Evidence Analysis" 26-29 September</a></li><li><a href="https://www.newton.ac.uk/event/fos">Isaac Newton Institute (INI) Programme on Probability and Statistics in Forensic Science in Cambridge</a> </li><li><a href="http://bayes-knowledge.com/">BAYES-KNOWLEDGE project</a></li></ul><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-awS3zI1SKF8/V9vn4S4X6SI/AAAAAAAAAdc/sElzHSJlR5cDqh49Kzy20b1wA-d-91j0ACLcB/s1600/BK_ERC_logo.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="206" src="https://4.bp.blogspot.com/-awS3zI1SKF8/V9vn4S4X6SI/AAAAAAAAAdc/sElzHSJlR5cDqh49Kzy20b1wA-d-91j0ACLcB/s400/BK_ERC_logo.jpg" width="400" /></a></div><br /><br />Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com0tag:blogger.com,1999:blog-6468094748577058716.post-41657526423518308642016-07-01T07:17:00.001-07:002016-07-04T03:27:53.133-07:00The likelihood ratio and why its use in forensic analysis is often flawed<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><img border="0" height="306" src="https://3.bp.blogspot.com/-6oW6weUCBBI/V3Zuv4JaCwI/AAAAAAAAAc4/ZtucBCTtLqwvyBoryxqZehJYK8ocPd3xgCLcB/s320/forrest.jpg" style="margin-left: auto; margin-right: auto;" width="320" /></td></tr><tr><td class="tr-caption" style="text-align: center;">FORREST 2016 (for details see <a href="http://www.theforensicinstitute.com/training/forrest-conference/forrest-2016">here</a>)</td></tr></tbody></table><br />I am giving the opening address at the Forensic Institute 2016 Conference (<a href="http://www.theforensicinstitute.com/training/forrest-conference/forrest-2016">FORREST 2016</a>) in Glasgow on 5 July 2016. The talk is about the benefits and pitfalls of using the likelihood ratio to help understand the impact of forensic evidence. The powerpoint slide show for my talk is <a href="http://www.eecs.qmul.ac.uk/%7Enorman/Talks/AgenaRisk%20OverviewABNMS_2013_FINAL.ppsx">here</a>. <br /><br />While a lot of the material is based on our recent <a href="http://www.annualreviews.org/doi/10.1146/annurev-statistics-041715-033428">Bayes and the Law paper</a>, there is a new simple example of the danger of using the likelihood ratio (LR) when the defence hypothesis is <i><b>not </b></i>the negation of the prosecution hypothesis. Recall that the LR for some evidence E is the probability of E given the prosecution hypothesis divided by the probability of E given the defence hypothesis. The reason the LR is popular is because it is a measure of the probative value of the evidence E in the sense that:<br /><ul><li>LR>1 means E supports the prosecution hypothesis</li><li>LR<1 means E supports the defence hypothesis</li><li>LR=1 means E has no probative value</li></ul>This follows from Bayes Theorem but only when the defence hypothesis is the negation of the prosecution hypothesis. The problem is that there are Forensic Science Guidelines* that explicitly state that this requirement is not necessary. But if the requirement is not met then it is possible to have LR<1 even though E actually supports the prosecution hypothesis. Here is the example:<br /><br /><br /><blockquote class="tr_bq"><div class="MsoNormal"><i>A raffle has 100 tickets numbered 1 to 100 </i></div><div class="MsoNormal"><i><br /></i></div><div class="MsoNormal"><i>Joe buys 2 tickets and gets numbers 3 and 99</i></div><div class="MsoNormal"><i><br /></i></div><div class="MsoNormal"><i>The ticket is drawn but is blown away in the wind. </i></div><div class="MsoNormal"><i><br /></i></div><div class="MsoNormal"><i>Joe says the ticket drawn was 99 and demands the prize, but the organisers say 99 was not the winning ticket. In this case the prosecution hypothesis H is “Joe won the raffle”.</i></div><div class="MsoNormal"><i>Suppose we have the following evidence E presented by a totally reliable eye witness:</i></div><div class="MsoNormal"><i> </i></div><div class="MsoNormal"><i>E: “winning ticket was an odd nineties number (i.e. 91, 93, 95, 97, or 99)”</i></div><div class="MsoNormal"><i><br /></i></div><div class="MsoNormal"><i>Does the evidence E support H? let's do the calculations: </i></div></blockquote><blockquote><ul><li><i>Probability of E given H = ½</i></li><li><i>Probability of E given not H = 4/98 </i></li></ul><div class="MsoNormal"><i>So the LR is<span style="mso-spacerun: yes;"> </span>(1/2)/(4/98) = 12.25</i></div><div class="MsoNormal"><i><br /></i></div><div class="MsoNormal"><i>That means the evidence CLEARLY supports H. In fact, the probability of H increases from a prior of 1/50 to a posterior of 1/5, so thee is no doubt it is supportive.</i></div><div class="MsoNormal"><i><br /></i></div><div class="MsoNormal"><i>But suppose the organisers’ assert that their (defence) hypothesis is:</i></div><div class="MsoNormal"><i><br /></i></div><div class="MsoNormal"><i>H’: “Winning ticket was a number between 95 and 97”</i></div><div class="MsoNormal"><i><br /></i></div><div class="MsoNormal"><i>Then in this case we have:</i></div><ul><li><i>Probability of E given H = ½</i></li><li><i>Probability of E given H’ = 2/3</i></li></ul><div class="MsoNormal"><i>So the LR is<span style="mso-spacerun: yes;"> </span>( 1/2)/(2/3) = 0.75</i></div><div class="MsoNormal"><i><br /></i></div><div class="MsoNormal"><i>That means that in this case the evidence supports H’ over H. The problem is that, while the LR does indeed 'prove' that the evidence is more supportive of H' than H that is actually irrelevant unless there is other evidence that proves that H' is the only possible alternative to H (i.e. that H' equivalent to 'not H'). In fact, the 'defence' hypothesis has been cherry picked. The evidence E supports H irrespective of which cherry-picked alternative is considered. </i></div></blockquote><div class="MsoNormal">Norman Fenton, 1 July 2016 <br /> </div><br />*Jackson G, Aitken C, Roberts P. 2013. Practitioner guide no. 4. Case assessment and interpretation of expert evidence: guidance for judges, lawyers, forensic scientists and expert witnesses. London: R. Stat. Soc. <a href="http://www.maths.ed.ac.uk/%E2%88%BCcgga/Guide-4-WEB.pdf">http://www.maths.ed.ac.uk/∼cgga/Guide-4-WEB.pdf</a>. <b>Page 29: "The LR is the ratio of two probabilities, conditioned on mutually <span class="highlight selected">exclusive</span> (but not necessarily exhaustive) propositions."</b><br /><br />See also:<br /><ul><li><a href="http://bayesknowledge.blogspot.co.uk/2016/02/problems-with-likelihood-ratio-method.html">Problems with the Likelihood Ratio method for determining probative value of evidence: the need for exhaustive hypotheses</a></li><li><a href="http://bayesknowledge.blogspot.co.uk/2016/01/misleading-dna-evidence-and-current.html">Misleading DNA evidence</a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2013/09/barry-george-case-new-insights-on.html">Barry George case: new insights on the evidence</a></li></ul><br />Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com0tag:blogger.com,1999:blog-6468094748577058716.post-1810714866025619822016-06-17T06:51:00.002-07:002016-06-17T06:55:32.646-07:00Bayes and the Law: Cambridge event and new review paperWhen we set up the Bayes and the Law network in 2012 we made the following assertion:<br /><blockquote class="tr_bq">Proper use of statistics and probabilistic reasoning has the potential to improve dramatically the efficiency, transparency and fairness of the criminal justice system and the accuracy of its verdicts, by enabling the relevance of evidence – especially forensic evidence - to be meaningfully evaluated and communicated. However, its actual use in practice is minimal, and indeed the most natural way to handle probabilistic evidence (Bayes) has generally been shunned. </blockquote>The <a href="https://www.newton.ac.uk/event/fosw01">first workshop (30th August to 2nd September 2016)</a> that is part of our 6-month programme "<a href="https://www.newton.ac.uk/event/fos">Probability and Statistics in Forensic Science</a>" at the Issac Newton Institute of Mathematics Cambridge directly addresses the above assertion and seeks to understand the scope, limitations, and barriers of using statistics and probability in court. The Workshop brings together many of the world's leading academics and pracitioners (including lawyers) in this area. Information on the programme and how to participate can be found <a href="https://www.newton.ac.uk/event/fosw01">here</a>.<br /><br />A <a href="http://www.annualreviews.org/doi/10.1146/annurev-statistics-041715-033428">new review paper</a>* "Bayes and the Law" has just been published in Annual Review of Statistics and Its Application. <br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-AaK1lQ_GZ-Q/V2P7eeFJDFI/AAAAAAAAAco/ss-Dfaau9msSQzfYG010EMqb_r7kxvQRgCLcB/s1600/bayes_law.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="337" src="https://2.bp.blogspot.com/-AaK1lQ_GZ-Q/V2P7eeFJDFI/AAAAAAAAAco/ss-Dfaau9msSQzfYG010EMqb_r7kxvQRgCLcB/s400/bayes_law.jpg" width="400" /></a></div><br />This paper reviews the potential and actual use of Bayes in the law and explains the main reasons for its lack of impact on legal practice. These include misconceptions by the legal community about Bayes’ theorem, over-reliance on the use of the likelihood ratio and the lack of adoption of modern computational methods. The paper argues that Bayesian Networks (BNs), which automatically produce the necessary Bayesian calculations, provide an opportunity to address most concerns about using Bayes in the law.<br /><br />*Full citation:<br /><blockquote class="tr_bq">Fenton N.E, Neil M, Berger D, “Bayes and the Law”, Annual Review of Statistics and Its Application, Volume 3, pp51-77, June 2016 <a href="http://dx.doi.org/10.1146/annurev-statistics-041715-033428"> http://dx.doi.org/10.1146/annurev-statistics-041715-033428</a>. Pre-publication version is <a href="http://www.eecs.qmul.ac.uk/%7Enorman/papers/bayes_and_the_law_revised_FINAL.pdf">here</a> and the Supplementary Material is <a href="https://www.eecs.qmul.ac.uk/%7Enorman/papers/bayes_and_the_law__SUPPLEMENTARY_FINAL.pdf">here</a>.</blockquote>Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com0tag:blogger.com,1999:blog-6468094748577058716.post-14686336675007713592016-06-01T07:50:00.001-07:002016-06-01T07:50:48.513-07:00Bayesian networks for Cost, Benefit and Risk Analysis of Agricultural Development Projects<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-WV2BMjdAmHs/V07u1m1xBWI/AAAAAAAAAcI/nYTY1220d1Y3cZeMPdpc_ISU5hRZdPpPQCLcB/s1600/project_model.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="223" src="https://1.bp.blogspot.com/-WV2BMjdAmHs/V07u1m1xBWI/AAAAAAAAAcI/nYTY1220d1Y3cZeMPdpc_ISU5hRZdPpPQCLcB/s400/project_model.jpg" width="400" /></a></div><br />Successful implementation of major projects requires careful management of uncertainty and risk. Yet, uncertainty is rarely effectively calculated when analysing project costs and benefits. In the case of major agricultural and other development projects in Africa this challenge is especially important. <br /><br />A <a href="http://authors.elsevier.com/a/1T1mZ3PiGT01wU">paper just published</a>* in the journal <i><b>Experts Systems with Applications</b></i> presents a Bayesian network (BN) modelling framework to calculate the costs, benefits, and return on investment of a project over a specified time period, allowing for changing circumstances and trade-offs. Marianne Gadeberg and Eike Luedeling have written an overview of the work <a href="https://wle.cgiar.org/thrive/2016/06/01/can-we-build-better-project-assessing-complexities-development-projects">here</a>.<br /><br />The framework uses hybrid and dynamic BNs containing both discrete and continuous variables over multiple time stages. The BN framework calculates costs and benefits based on multiple causal factors including the effects of individual risk factors, budget deficits, and time value discounting, taking account of the parameter uncertainty of all continuous variables. The framework can serve as the basis for various project management assessments and is illustrated using a case study of an agricultural development project. The work was a collaboration between the World Agroforestry Centre (ICRAF), Nairobi, Kenya, the Risk Information Management Group at Queen Mary (as part of the BAYES-KNOWLEDGE project) and Agena Ltd.<br /><br />*The full reference is:<br /><blockquote class="tr_bq">Yet, B., Constantinou, A., Fenton, N., Neil, M., Luedeling, E., & Shepherd, K. (2016). "A Bayesian Network Framework for Project Cost, Benefit and Risk Analysis with an Agricultural Development Case Study" . <i>Expert Systems with Applications</i>, Volume 60, 30 October 2016, Pages 141–155. <a href="http://www.sciencedirect.com/science/article/pii/S0957417416302238">DOI: 10.1016/j.eswa.2016.05.005</a>. </blockquote>Until July 2016 the <a href="http://authors.elsevier.com/a/1T1mZ3PiGT01wU">full published pdf</a> is available for free. A permanent pre-publication pdf is available <a href="https://www.eecs.qmul.ac.uk/%7Enorman/papers/Project_ROI_Preprint.pdf">here</a>. <br /><br /><b>See also</b>: <a href="https://wle.cgiar.org/thrive/2016/06/01/can-we-build-better-project-assessing-complexities-development-projects">Can we build a better project: assessing complexities in development projects</a> <br /><br /><b>Acknowledgements</b>: Part of this work was performed under the auspices of EU project ERC-2013-AdG339182-BAYES_KNOWLEDGE and part under ICRAF Contract No SD4/2012/214 issued to Agena. We acknowledge support from the Water, Land and Ecosystems (WLE) program of the Consultative Group on International Agricultural Research (CGIAR). Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com0tag:blogger.com,1999:blog-6468094748577058716.post-77091072920388537322016-05-26T06:06:00.000-07:002016-05-26T06:10:27.767-07:00Using Bayesian networks to assess new forensic evidence in an appeal case<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-KDImVwn9KSQ/V0brFey4_fI/AAAAAAAAAb4/aONT8VnN_qkznoEbFM5z6yxG7uj7kvjWwCKgB/s1600/sound_BN.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://2.bp.blogspot.com/-KDImVwn9KSQ/V0brFey4_fI/AAAAAAAAAb4/aONT8VnN_qkznoEbFM5z6yxG7uj7kvjWwCKgB/s1600/sound_BN.jpg" /></a></div>If new forensic evidence becomes available after a conviction how do lawyers determine whether it raises sufficient questions about the verdict in order to launch an appeal? It turns out that there is no systematic framework to help lawyers do this. But <a href="https://www.eecs.qmul.ac.uk/%7Enorman/papers/BNs_Appeal_published.pdf">a paper published today by Nadine Smit and colleagues in Crime Science</a> presents such a framework driven by a recent case, in which a defendant was convicted primarily on the basis of sound evidence, but where subsequent analysis of the evidence revealed additional sounds that were not considered during the trial.<br /><br />From the case documentation, we know the following:<br /><ul><li>A baby was injured during an incident on the top floor of a house</li><li>Blood from the baby was found on the wall in one of the rooms upstairs</li><li>On an audio recording of the emergency telephone call made by the suspect, a scraping sound (allegedly indicating scraping blood off a wall) can be heard</li><li>The suspect was charged with attempted murder </li></ul>The audio evidence played a significant role in the trial. But, during the appeal preparation process, the call was re-analysed by an audio expert on behalf of the defence, and four other sounds were identified on the same recording that, according to the expert, showed similarities to the original sound. In particular, one of these sounds was of interest because of background noise that could be heard simultaneously. The background noise was presumed to be the television, which was located in a different room to where the prosecution argued the scraping of the blood took place. During this second sound, the TV (located downstairs) could be heard simultaneously on the emergency recording. A statement by the police reads that the suspect was frequently rubbing his face in their presence. The defence proposed that the incriminating sound in the recording was not blood scraping after all, but simply the defendant rubbing his face. <br /><br />The framework described in Smit's paper is intended to overcome the gap between what is generally known from scientific analyses and what is hypothesized in a legal setting. It is based on Bayesian networks (BNs) which are a structured and understandable way to evaluate the evidence in the specific case context and present it in a clear manner in court. However, BN methods are often criticised for not being sufficiently transparent for legal professionals. To address this concern the paper shows the extent to which the reasoning and decisions of the particular case can be made explicit and transparent. The BN approach enables us to clearly define the relevant propositions and evidence, and uses sensitivity analysis to assess the impact of the evidence under different prior assumptions. The results show that such a framework is suitable to identify information that is currently missing, and clearly crucial for a valid and complete reasoning process. Furthermore, a method is provided whereby BNs can serve as a guide to not only reason with incomplete evidence in forensic cases, but also identify very specific research questions that should be addressed to extend the evidence base to solve similar issues in the future. <br /><br />Full citation:<br /><blockquote class="tr_bq"><i>Smit, N. M., Lagnado, D. A., Morgan, R. M., & Fenton, N. E. (2016). "An investigation of the application of Bayesian networks to case assessment in an appeal case". Crime Science, 2016, 5: 9, <a href="http://dx.doi.org/10.1016/j.artmed.2016.01.002">DOI 10.1186/s40163-016-0057-6</a> (open source). <a href="https://www.eecs.qmul.ac.uk/%7Enorman/papers/BNs_Appeal_published.pdf">Published version pdf.</a></i></blockquote><span style="font-size: x-small;">The research was funded by the Engineering and Physical Sciences Research Council of the UK through the Security Science Doctoral Research Training Centre (UCL SECReT) based at University College London (EP/G037264/1), and the European Research Council (ERC-2013-AdG339182-BAYES_KNOWLEDGE). </span><br /><br /><span style="font-size: x-small;">The BN model (which is fully spceified in the paper) was built and run using the <a href="http://www.agenarisk.com/">free version of AgenaRisk</a>.</span>Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com0tag:blogger.com,1999:blog-6468094748577058716.post-21416611087544939422016-04-26T14:54:00.002-07:002016-04-26T15:27:00.261-07:00Hillsborough Inquest - my input<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="http://www.bbc.co.uk/news/uk-england-merseyside-28095597"><img border="0" height="400" src="https://1.bp.blogspot.com/-xrm0DGjoEfY/Vx_iaC_tBYI/AAAAAAAAAbQ/cUqqk-zjmf4RVDwZoixy6gpq_kpl3QlDQCLcB/s400/hillsborough.jpg" width="296" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><a href="http://www.bbc.co.uk/news/uk-england-merseyside-28095597"><br /></a></td></tr></tbody></table>With <a href="http://www.bbc.co.uk/news/uk-england-36138337">today's verdict (fans unlawfully killed)</a> coming after more than two years I can now speak about my own involvement in the Inquest.<br /><br />Because of the years that have passed few people are aware that there was a 'near-miss' disaster at Hillsborough eight years before the actual disaster. The circumstances were essentially identical - an FA Cup Semi Final with far too many supporters let in to the Leppings Lane stand leading to a massive crush. Because of the quick thinking of a steward who was able to open a gate onto the pitch nobody died on that occasion (although there were many injuries). I know this because I was present at that earlier near disaster and I was, in fact, Secretary of the Sheffield Spurs Supporters Club. At the time I wrote to the FA and South Yorkshire police as I felt mistakes had been made, and indeed the incident was sufficiently serious that Hillsborough (which had been used every year as one of the two semi-final venues) was avoided until 1988 (the year before the disaster). Immediately after the disaster in 1989 I wrote to the FA and Lord Taylor (who led the original enquiry) to inform them of the events of 1981. Although I was interviewed at that time by the Police investigators, my evidence was never used.<br /><br />In 2014 - out of the blue - I was asked to attend the new Hillsborough Inquest as it had been decided that the 1981 incident was an important piece of the story. Here are a couple of links to media reports about my appearance:<br /><ul><li><a href="http://www.bbc.co.uk/news/uk-england-merseyside-28095597">BBC Report</a></li><li><a href="http://www.liverpoolfc.com/news/latest-news/165553-hillsborough-inquests-june-30">Liverpool FC report </a></li></ul>Norman Fenton, 26 April 2016<br /><br /><br /><br />Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com0tag:blogger.com,1999:blog-6468094748577058716.post-62892593326561666422016-03-25T04:15:00.002-07:002016-03-25T04:15:15.814-07:00Statistics of coincidences: Ben Geen case revisited (ABC)<div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-3EYls1TP694/VvUc88RelDI/AAAAAAAAAao/ceqtyu7XBUYzoftKbqQDYp2oMtvZscI-Q/s1600/geen_abc.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="342" src="https://4.bp.blogspot.com/-3EYls1TP694/VvUc88RelDI/AAAAAAAAAao/ceqtyu7XBUYzoftKbqQDYp2oMtvZscI-Q/s400/geen_abc.jpg" width="400" /></a></div><br />In November 2014 <a href="http://probabilityandlaw.blogspot.co.uk/2014/11/the-ben-geen-case-another-miscarriage.html">I reported on the case of nurse Ben Geen who was convicted in 2006 for murdering 2 patients and seriously harming 15 others</a>. I had been asked to <a href="http://www.theguardian.com/uk-news/2015/feb/15/statisticians-respiratory-arrests-trial-ben-geen">produce an expert report</a> on the 'statistical coincidences' in the case for the Criminal Cases Review Board.<br /><br />Now a <a href="http://www.abc.net.au/radionational/programs/healthreport/an-unusual-pattern/7274116">30-minute documentary on the case</a> presented by Joel Werner is to be aired on Australia's national radio station ABC on 28 March. In the programme (which you can listen to in full from the <a href="http://www.abc.net.au/radionational/programs/healthreport/an-unusual-pattern/7274116">links at the top of the ABC page</a>) I present a lay summary of the statistical argument (from minutes 16:30 to 21:34).<br /><br />Norman FentonNorman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com0tag:blogger.com,1999:blog-6468094748577058716.post-6464497702328587552016-03-19T08:33:00.002-07:002016-03-21T05:28:15.585-07:00Turning poorly structured data into intelligent Bayesian Network models for medical decision support<div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-yXSOaix4tJM/Vu1buxT17wI/AAAAAAAAAaY/PSYk_wzqUDowHRDrhHsksrZE1TOpch9Lg/s1600/questionnaires.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="257" src="https://3.bp.blogspot.com/-yXSOaix4tJM/Vu1buxT17wI/AAAAAAAAAaY/PSYk_wzqUDowHRDrhHsksrZE1TOpch9Lg/s400/questionnaires.jpg" width="400" /></a></div><br /><br />Medical data is very often badly structured, incomplete and inconsistent. This limits our ability to generate useful models for prediction and decision support if we rely purely on machine learning techniques. That means we need to exploit expert knowledge at various model development stages. This problem - which is common in many application domains - is tackled in a paper** published in the latest issue of <i><b>Artificial Intelligence in Medicine</b></i>.<br /><br />The paper describes a rigorous and repeatable method for building effective Bayesian Network (BN) models from complex data - much of which comes in unstructured and incomplete responses by patients from questionnaires and interviews. Such data inevitably contains repetitive, redundant and contradictory responses; without expert knowledge learning a BN model from the data alone is especially problematic where we are interested in simulating causal interventions for risk management. The novelty of this work is that it provides a rigorous consolidated and generalised framework that addresses the whole life-cycle of BN model development. The method is validated using data from forensic psychiatry. The resulting BN models demonstrate competitive to superior predictive performance against the data-driven state-of-the-art models. More importantly, the resulting BN models go beyond improving predictive accuracy and into usefulness for risk management through intervention, and enhanced decision support in terms of answering complex clinical questions that are based on unobserved evidence.<br /><br />The method is applicable to any application domain involving large-scale decision analysis based on such complex and unstructured information. It challenges decision scientists to reason about building models based on what information is really required for inference, rather than based on what data is available. Hence, it forces decision scientists to use available data in a much smarter way.<br /><br />**The full reference for the paper is: <br /><blockquote class="tr_bq">Constantinou, A. C., Fenton, N., Marsh, W., & Radlinski, L. (2016). "From complex questionnaire and interviewing data to intelligent Bayesian Network models for medical decision support".<i>Artificial Intelligence in Medicine,</i> Vol 67 pages 75-93. DOI <a href="http://dx.doi.org/10.1016/j.artmed.2016.01.002">http://dx.doi.org/10.1016/j.artmed.2016.01.002</a></blockquote><br />For those who do not have access to the journal a pre-publication draft can be downloaded: <a href="http://constantinou.info/downloads/papers/complexBN.pdf">http://constantinou.info/downloads/papers/complexBN.pdf</a> Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com0tag:blogger.com,1999:blog-6468094748577058716.post-38374609373967957442016-03-10T06:40:00.001-08:002016-03-10T07:41:55.775-08:00A Bayesian network to determine optimal strategy for Spurs' success<div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-5-qPDxYTS0A/VuCh6zGiEzI/AAAAAAAAAaI/7ZUsXXaclD0/s1600/poc.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="336" src="https://3.bp.blogspot.com/-5-qPDxYTS0A/VuCh6zGiEzI/AAAAAAAAAaI/7ZUsXXaclD0/s400/poc.jpg" width="400" /></a></div><br />As a committed Spurs fan I have spent the last few months salivating at the club's sudden and unexpected rise and the prospect of them winning their first league title since 1961. By mid-February they were clear favourites to win the Premier League title. However, in my view, the challenge was compromised by the team becoming overstretched by playing too many matches in a short space of time. In particular, I felt that their involvement in the Europa League was an unnecessary distraction and burden. When I expressed these views on a Spurs online forum (backed up with some data showing consistent under-performance during periods when they were involved in the Europa League) I got heavily criticised by other fans who said it was important to try to win every competition. <br /><br />Having simultaneously been involved in research discussions about the use of decisions in Bayesian networks, I decided to build a small model in AgenaRisk to resolve the dilemma once and for all. I have written up the results of the analysis <a href="https://www.eecs.qmul.ac.uk/%7Enorman/papers/spurs_decision.pdf">here</a>. The model can be downloaded from <a href="http://www.eecs.qmul.ac.uk/%7Enorman/Models/spurs_decision_problems.cmp">here</a>.<br /><br />In summary, there were 4 strategic options available to Spurs' manager Mauricio Pochettino at the time I started to do the analysis:<br /><ol><li>Focus on Premier League </li><li>Focus on Premier League and FA Cup </li><li>Focus on Premier League and Europa League </li><li>Focus on all three competitions </li></ol>My BN model shows that the optimal decision (based on my subjective utility values of the different outcomes) was to go for 1 with 2 a close second. Unfortunately (I believe) Pochettino opted for 3 which, as the model shows, suggests his personal utility value for winning the Europa League was actually higher than winning the Premier League.<br /><br /><b>Downloads</b>:<br /><br /><ul><li><a href="https://www.eecs.qmul.ac.uk/%7Enorman/papers/spurs_decision.pdf">Norman Fenton "To focus on winning one competition or try to win more: Using a Bayesian network to help decide the optimum strategy for a football club", March 2016</a></li><li><a href="http://www.eecs.qmul.ac.uk/%7Enorman/Models/spurs_decision_problems.cmp">Bayesian network model for the paper</a> </li></ul><b>See also</b>: <a href="http://probabilityandlaw.blogspot.co.uk/2013/08/the-problem-with-predicting-football.html">The problem with predicting football results - you cannot rely on the data</a>Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com0tag:blogger.com,1999:blog-6468094748577058716.post-66681907873433697532016-02-04T10:53:00.001-08:002016-02-04T11:18:17.487-08:00Problems with the Likelihood Ratio method for determining probative value of evidence: the need for exhaustive hypothesesNorman Fenton, 4 Feb 2016<br /><br />I have written several times before about the likelihood ratio (LR) method that is recommended for use by forensic scientists when presenting evidence (such as the fact that DNA collected at a crime scene is found to have a profile that matches the DNA profile of a defendant in a case). In general the LR is a very good and simple method for communicating the impact of evidence (in this case on the hypothesis that the defendant was at the crime scene), but its correct use is based on strict assumptions that have been routinely ignored by forensic experts and statisticians, leading to the very kind of confusion and misunderstanding (when presented to lawyers and juries) that it was supposed to help avoid. The papers [<a href="https://www.researchgate.net/publication/263739610_When_%27neutral%27_evidence_still_has_probative_value_%28with_implications_from_the_Barry_George_Case%29">1</a>] and [<a href="http://www.eecs.qmul.ac.uk/%7Enorman/papers/bayes_and_the_law_revised%20FINAL.pdf">2</a>] provide an in-depth analysis of the problems. In this short article I will highlight just one of these problems which invalidate the LR. Subsequent articles will focus on the other problems and issues. <br /><br />To recap: The LR is the probability of finding the evidence E if the prosecution hypothesis Hp is true (formally we write this as 'Probability of E given Hp') divided by the probability of finding the evidence E if the defence hypothesis Hd is true (formally we write this as 'probability of E given Hd').<br /><br />So, to compute the LR, the forensic expert is forced to consider the probability of finding the evidence under <i><b>both</b></i> the prosecution and defence hypotheses. This is a very good thing to do because it helps to avoid common errors of communication that can mislead lawyers and juries (notably the <a href="http://www.agenarisk.com/resources/probability_puzzles/prosecutor.shtml">prosecutor's fallacy</a>). Even more importantly, the LR is a measure of the probative value of the evidence because:<br /><ul><li>when the LR is greater than one the evidence supports the prosecution hypothesis (increasingly for larger values); </li><li>when the LR is less than one it supports the defence hypothesis (increasingly as the LR gets closer to zero); </li><li>when the LR is equal to one then the evidence supports neither hypothesis and so is 'neutral'. In such cases, since the evidence has no probative value lawyers and forensic experts believe it should not be admissible. </li></ul>However, as explained in [<a href="https://www.researchgate.net/publication/263739610_When_%27neutral%27_evidence_still_has_probative_value_%28with_implications_from_the_Barry_George_Case%29">1</a>] and [<a href="http://www.eecs.qmul.ac.uk/%7Enorman/papers/bayes_and_the_law_revised%20FINAL.pdf">2</a>] (because of Bayes Theorem) for the LR to 'work' with respect to being a measure of probative value, the two hypotheses considered must be 'mutually exclusive and exhaustive'. This means that the defence hypothesis Hd must simply be the negation of the prosecution hypothesis Hp. So, for example, if Hp is "Defendant was at the crime scene" then Hp must be "Defendant was <i><b>not </b></i>at the crime scene". Now, while there is more or less unanimity within the statistics and forensics field that the hypotheses must be mutually exclusive in order for the LR to be used, there is no such unanimity about the hypotheses being exhaustive. Indeed, the Royal Statistical Society Practitioner Guide to Case Assessment and Interpretation of Expert Evidence Guidelines [<a href="http://www.rss.org.uk/Images/PDF/influencing-change/rss-case-assessment-interpretation-expert-evidence.pdf">3</a>] (page 32) specifies that the LR requires two mutually exclusive but not necessarily exhaustive hypotheses (which, interestingly, contradicts what is stated in the earlier Guidelines by the same group [<a href="http://www.rss.org.uk/Images/PDF/influencing-change/rss-fundamentals-probability-statistical-evidence.pdf">4</a>], page 96). To see why incorrect conclusions may be drawn when the hypotheses are not exhaustive we consider a very simple example: <br /><br />Fred is the defendant for a crime. The main evidence against Joe is that his DNA profile is found to be a match of a DNA sample found at the scene of the crime (for simplicity we ignore the possibility of errors in the DNA match). The DNA profile is of a type that is found in only 1 in 10,000 people. However, Fred has an identical twin brother Joe. Using the following:<br /><ul><li>Prosecution hypothesis Hp: "Fred is the source of the DNA"</li><li>Defence hypothesis Hd: "Joe is the source of the DNA"</li></ul>and<br /><ul><li>Evidence E: "the DNA found matches Fred's profile"</li></ul>The defence reasons - correctly using the likelihood ratio approach- that the evidence E has no probative value with respect to the above two hypotheses, because the twins have the same DNA profile, i.e.<br />P(E given Hp) = P(E given Hd) = 1.<br />Hence, the defence demands the evidence is withdrawn because it is 'neutral'.<br /><br />The problem here is that, even if we assume the hypotheses are mutually exclusive (i.e. we exclude the possibility that both the twins committed the crime) they are certainly NOT exhaustive. The correct defence hypothesis in this case should be "Fred is NOT the source of the DNA". This is made up of two cases:<br /><ul><li>Hd: "Joe is the source of the DNA"</li><li>Ho: "Another person (not Fred or Joe) is the source of the DNA"</li></ul>If we assume - before any evidence is known - that Hp, Hd and Ho are equally likely then the impact of observing the evidence is certainly NOT neutral - it is probative in favour of the prosecution hypothesis as can be shown from running the calculations in a Bayesian network tool:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-5FH8dDbIG4w/VrOPMXzL7SI/AAAAAAAAAYk/o-_Bfomyfvs/s1600/twins1.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="256" src="https://4.bp.blogspot.com/-5FH8dDbIG4w/VrOPMXzL7SI/AAAAAAAAAYk/o-_Bfomyfvs/s400/twins1.jpg" width="400" /></a></div><br />The probability of Hp increases from 33% to to just under 50%.<br /><br />But the supposedly 'neutral' evidence can have an even more dramatic impact in practice. Suppose, for example, that Joe has an alibi that is considered pretty reliable. Then this might reduce our prior belief in his innocence to 2%. In this case the before and after probabilities are:<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-L4ePe5qhr34/VrOPWGmnzLI/AAAAAAAAAYo/OSLaKbg_eYE/s1600/twins2.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="248" src="https://4.bp.blogspot.com/-L4ePe5qhr34/VrOPWGmnzLI/AAAAAAAAAYo/OSLaKbg_eYE/s400/twins2.jpg" width="400" /></a></div><br />The belief in the prosecution hypothesis in this case has shifted to above 95% - possibly sufficient for a jury to be convinced it is the truth.<br /><br />If the DNA evidence in the above example was a non-match then the LR approach using the original hypotheses is even more obviously flawed because in this case: <br /> P(E given Hp) = P(E given Hd) = 0<br />But the evidence is certainly anything but 'neutral' because, after observing the evidence, the prosecution hypothesis Hp must be false (as must Hd). <br /><br />While the example above is obviously simplistic and contrived more realistic examples are provided in [<a href="https://www.researchgate.net/publication/263739610_When_%27neutral%27_evidence_still_has_probative_value_%28with_implications_from_the_Barry_George_Case%29">1</a>] which also highlights this very problem in the case of Barry George (convicted and subsequently acquitted of the murder of TV celebrity Gill Dando after an appeal ruled that the gunpowder residue evidence presented in the original trial was inadmissible in a re-trial on the basis that it had a LR equal to one and so had 'no probative value'.)<br /><br />See also:<br /><ul><li><a href="http://probabilityandlaw.blogspot.co.uk/2013/09/barry-george-case-new-insights-on.html">Barry George case: new insights on the evidence </a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2014/01/sally-clark-revisited-another-key.html">Sally Clark revisited: another key statistical oversight?</a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2012/01/prosecutor-fallacy-in-stephen-lawrence.html">Prosecutor fallacy in Stephen Lawrence case? </a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2012/06/prosecutor-fallacy-again-in-media.html">Prosecutor fallacy in media reporting of Burgess DNA case</a> </li><li><a href="http://probabilityandlaw.blogspot.co.uk/2013/07/flaky-dna-prosecutors-fallacy-yet-again.html">Flaky DNA: Prosecutors fallacy yet again</a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2012/06/prosecutors-fallacy-just-will-not-go.html">Prosecutors fallacy just will not go away</a></li><li><a href="http://probabilityandlaw.blogspot.co.uk/2016/01/misleading-dna-evidence-and-current.html">Misleading DNA evidence </a></li></ul><br /><b>References</b><br /><ol><li>Fenton, N. E., D. Berger, D. Lagnado, M. Neil and A. Hsu, (2013). "When ‘neutral’ evidence still has probative value (with implications from the Barry George Case)", Science and Justice, http://dx.doi.org/10.1016/j.scijus.2013.07.002. A pre-publication draft of the article can be found <a href="https://www.researchgate.net/publication/263739610_When_%27neutral%27_evidence_still_has_probative_value_%28with_implications_from_the_Barry_George_Case%29">here</a>.</li><li>Fenton N.E, Neil M, Berger D, “Bayes and the Law”, Annual Review of Statistics and Its Application, Volume 3, 2016 to appear. Pre-publication version <a href="http://www.eecs.qmul.ac.uk/%7Enorman/papers/bayes_and_the_law_revised%20FINAL.pdf">here</a>. </li><li>Jackson, G., Aitken, C., & Roberts, P. (2015). PRACTITIONER GUIDE NO 4: Case Assessment and Interpretation of Expert Evidence. Royal Statistical Society. Available <a href="http://www.rss.org.uk/Images/PDF/influencing-change/rss-case-assessment-interpretation-expert-evidence.pdf">here.</a></li><li>Aitken, C, Roberts, P, Jackson, G, (2010) PRACTITIONER GUIDE NO 1:"Fundamentals of Probability and Statistical Evidence in Criminal Proceedings: Guidance for Judges, Lawyers, Forensic Scientists and Expert Witnesses. Royal Statistical Society. Available <a href="http://www.rss.org.uk/Images/PDF/influencing-change/rss-fundamentals-probability-statistical-evidence.pdf">here</a>. </li></ol><br />Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com0tag:blogger.com,1999:blog-6468094748577058716.post-46747613574994612712016-01-28T14:05:00.001-08:002016-01-30T12:53:56.391-08:00Misleading DNA evidence and the current damaged winning lottery ticket story<br /><div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/-HH6RcYGLJxM/Vqpu2aE5WbI/AAAAAAAAAYU/i24g23nLBGQ/s1600/dna_lottery.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="200" src="http://3.bp.blogspot.com/-HH6RcYGLJxM/Vqpu2aE5WbI/AAAAAAAAAYU/i24g23nLBGQ/s400/dna_lottery.jpg" width="400" /></a></div><div style="text-align: left;"><i><b>Norman Fenton, 28 January 2016</b></i></div><br />This post is primarily about how DNA match evidence is often presented in a way that is highly misleading (it is an important issue in an ongoing case I'm involved with). But in order to illustrate the point it turns out that we can use a simple analogy based loosely on <a href="http://www.telegraph.co.uk/news/newstopics/howaboutthat/12124279/Is-this-the-33-million-lottery-ticket.html">the current lottery story</a> that is getting a lot of media attention in the UK. This concerns an unverified £33 million winning ticket from a recent draw. About 200 people are claiming to have bought the (single) winning ticket but, until today*, none had actually provided proof of possessing such a ticket. The claim of one - Miss Susan Hinte - is the one that has grabbed media attention because she has produced a ticket in which key identifying information cannot be read because, she claims, the ticket was put through a washing machine. <br /><br />But first let's look at the DNA issue, which is concerned with the following generic problem:<br /><ul><li>The prosecution claims that defendant Joe was at the crime scene. This hypothesis is denoted as <b>Hp</b>.</li><li>A tiny trace of DNA from the crime scene has been analysed and found to match the profile of Joe. This evidence (of the match) is denoted <b>E</b>. </li></ul>Typically the defence will argue that Joe was not at the crime scene and that any DNA matching Joe - especially as it was a tiny trace - got there through secondary transfer or other means. So the defence hypothesis <b>Hd</b> is simply the negation of Hp.<br /><br />The DNA experts have correctly recognised that, in determining the probative value of the evidence E, they have to use the ‘likelihood ratio’ approach [1]. This means they have to consider <i><b>both </b></i>of the following probabilities:<br /><ol><li>The probability that E is the result of the prosecution hypothesis Hp being true - formally we write this as P(E given Hp)</li><li>The probability that E is the result of the defence hypothesis Hd being false - formally we write this as P(E given Hd) </li></ol>If probability 1 is greater than probability 2 then the evidence E supports Hp over Hd and vice versa. The likelihood ratio is simply 1 divided by 2 and provides a simple and compelling measure of probative value of evidence. If the ratio is greater than one the evidence E supports Hp, with higher values indicating stronger support. If the ratio is less than one the evidence E supports Hd, with smaller values indicating stronger support. However, for reasons explained in [1], this whole notion of probative value is not meaningful if the defence hypothesis Hd is not the negation of the prosecution hypothesis Hp. One of the common errors made by DNA experts is to replace Hd with a <i><b>different </b></i>hypothesis, namely Hd': "DNA from Joe got there by secondary transfer". In this case Hd' excludes other possibilities of observing E even though Joe was not at the crime scene (such as errors or contamination during the DNA testing, or the DNA belonging to a different person with the same profile etc) and is not even mutually exclusive to Hp since Joe may have been at the crime scene even though the trace sample was there through secondary transfer. But, while this common error is serious, it is not the real concern I wish to raise here. In fact, let's suppose that no such error is made and that the expert considers the correct Hd.<br />. <br />The real concern is how a jury member reacts when the DNA expert now makes the following assertions:<br /><ol><li><i><b>“The findings are what I would have expected if Hp were true.”</b></i> i.e. P(E given Hp) is very high</li><li><i><b>“The probability of the findings are considerably more likely to have been the result of Hp rather than Hd”</b></i> i.e. P(E given Hp) is much higher than P(E given Hd)</li></ol>Notwithstanding the unnecessary redundancy of statement 1, these assertions sound very important and suggest very strong support for the prosecution hypothesis, especially as most people would already have assumed (wrongly) that the DNA 'match' means the trace certainly belongs to Joe.<br /><br />But to demonstrate how misleading they are I will return now to the lottery example. For simplicity I will assume the old 6-ball lottery with 49 numbers. Suppose the winning numbers were:<br />1, 7, 21, 28, 40, 46<br /><br />Mrs Smith has a damaged ticket that she claims has the winning numbers. The evidence E is that the first number (which is the only number clearly visible) is 1.<br /><br />Our hypotheses are:<br /><ul><li>Hp: “Mrs Smith's ticket is the winning ticket”</li><li>Hd: “Mrs Smith ticket is not the winning ticket”</li></ul>In this case we know the following:<br /><ul><li>P(E given Hp) = 1 (it is certain that the first number on the ticket would be 1 if it was the winning ticket)</li><li>P(E given Hd) is 0.122 (this is the proportion of non-winning tickets that have 1 as the first number) </li></ul>So we could certainly make <i><b>exactly the same assertions</b></i> in this case as the DNA experts above: <br /><ol><li>“The findings are what I would have expected if Hp were true.” (since the probability of E given Hp is 1)</li><li>“The probability of the findings are considerably more likely to have been the result of Hp rather than Hd” (since 1 is considerably greater than 0.122).</li></ol>However, despite these (correct) assertions <i><b>it is almost certain that Hd rather than Hp is true</b></i> - Mrs Smith's ticket is not the winning ticket. In fact, the probability of Hp being true is less than one in 1.7 million (because there are over 1.7 million non-winning combinations in which the first number is 1).<br /><br />So what is the moral of this story? The likelihood ratio of the evidence might often suggest the evidence is highly probative in favour of one of the hypotheses, but if the prior probability of the alternative hypothesis was much higher to start with then the evidence will not ‘overturn’ the prior belief in favour of the alternative.<br /><br />Lay people ignore this in connection to DNA evidence. Because the random match probability associated with a DNA match is typically less than one in a billion, the very fact that the evidence E is a "DNA match" already puts into their mind the notion that this 'must tie the defendant to the crime scene'. But the random match probability is almost irrelevant in this case - it only accounts for a tiny proportion of P(E given Hp). Lay people can also easily be tricked into believing that the (redundant) assertion 1 “The findings are what I would have expected if Hp were true” provides additional weight to assertion 2.<br /><br />Unfortunately, this type of evidence is increasingly prejudicing juries and, I believe, leading to serious miscarriages of justice.<br /><br />*The <a href="http://www.dailymail.co.uk/news/article-3421068/Winner-comes-forward-REAL-winning-ticket-33million-Lotto-jackpot-s-NOT-woman-claimed-s-wash.html">real winner has now been found</a>, and since their ticket was not damaged it can not have been Miss Hinte<br /><br />[1] Fenton, N. E., D. Berger, D. Lagnado, M. Neil and A. Hsu, (2014). "When ‘neutral’ evidence still has probative value (with implications from the Barry George Case)", Science and Justice, 54(4), 274-287 http://dx.doi.org/10.1016/j.scijus.2013.07.002. (pre-publication draft <a href="http://www.eecs.qmul.ac.uk/%7Enorman/papers/probative_value.pdf">here</a>) Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com2tag:blogger.com,1999:blog-6468094748577058716.post-73118691670237368692015-12-08T14:11:00.002-08:002015-12-08T14:11:29.413-08:00Norman Fenton at Maths in Action Day (Warwick University)Today Norman Fenton was one of the five presenters at the <a href="http://www.thetrainingpartnership.org.uk/study-day/mathematics-in-action-10/">Mathematics in Action Day at Warwick University</a> - the others included writer and broadcaster <a href="http://simonsingh.net/">Simon Singh</a> and BBC presenter <a href="http://stevemould.com/">Steve Mould</a> (who is also part of the amazing trio <a href="http://festivalofthespokennerd.com/">Festival of the Spoken Nerd</a> which features Queen Mary's <a href="http://standupmaths.com/">Matt Parker</a>). The Maths in Action day is specifically targeted at A-Level Maths students and their teachers.<br /><br />Norman says:<br /><blockquote class="tr_bq">This was probably the biggest live event I have spoken at - an audience of 550 in the massive Butterworth Hall (which has recently hosted Paul Weller and the Style Council, Jools Holland) - so it was quite intimidating. My talk was on "Fallacies of Probability and Risk" (the powerpoint slides are <a href="http://www.eecs.qmul.ac.uk/%7Enorman/Talks/Fenton_Warwick/warwick_fenton.ppsx">here</a>). I hope to get some photos of the event uploaded shortly.</blockquote><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="http://4.bp.blogspot.com/-zJMJaWXBnAM/VmdRnciWFzI/AAAAAAAAAYE/tFPQqeg47Rw/s1600/CONFPARK-Butterworth-Hall-2.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" height="250" src="http://4.bp.blogspot.com/-zJMJaWXBnAM/VmdRnciWFzI/AAAAAAAAAYE/tFPQqeg47Rw/s400/CONFPARK-Butterworth-Hall-2.jpg" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Butterworth Hall (hopefully some real photos from the event to come)</td></tr></tbody></table><br />Norman Fentonhttp://www.blogger.com/profile/00665217873819266827noreply@blogger.com0