We have written many times before (see the links below) about use of the Likelihood Ratio (LR) in legal and forensic analysis.
To recap: the LR is a very good and simple method for determining the extent to which some evidence (such as DNA found at the crime scene matching the defendant) supports one hypothesis (such as "defendant is the source of the DNA") over an alternative hypothesis (such as "defendant is not the source of the DNA"). The previous articles discussed the various problems and misinterpretations surrounding the use of the LR. Many of these arise when the hypotheses are not mutually exclusive and exhaustive. This problem is especially pertinent in the case of 'DNA mixture' evidence, i.e. when some DNA sample relevant to a case comes from more than one person. With modern DNA testing techniques it is common to find DNA samples with multiple (but unknown number of) contributors. In such cases there is no obvious 'pair' of hypotheses that are mutually exclusive and exhaustive, since we have individual hypotheses such as:
- H1: suspect + one unknown
- H2: suspect + one known other
- H3: two unknowns
- H4: suspect + two unknowns
- H5: suspect + one known other + one unknown
- H6: suspect + two known others
- H7: three unknowns
- H8: one known other + two unknowns
- H9: two known others + one unknown
- H10: three known others
- H11: suspect + three unknowns
- H1: suspect + two unknowns
- H2: three unknowns
- P(E | H1) = 0.00000000000000000001 (10 to the minus 20)
- P(E | H2) = 0.00000000000000000000000001 (10 to the minus 26)
Apart from the communication problem in court of getting across what this all means (defence lawyers can and do exploit the very low probability of E given H1) and how it is computed, there is an underlying statistical problem with small likelihoods for non-exhaustive hypotheses and I will highlight the problem with two scenarios involving a simple urn example. Superficially, the scenarios seem identical. The first scenario causes no problem but the second one does. The concern is that it is not at all obvious that the DNA mixture problem always corresponds more closely to the first scenario than the second.
In both scenarios we assume the following:
There is an urn with 1000 balls – some of which are white. Suppose W is the (unknown) number of white balls. We have 2 hypotheses:
- H1: W=100
- H2: W=90
Scenario 1: We draw 1001 white balls. In this case using standard statistical assumptions we calculate P(E | H1) = 0.013, P(E|H2) = 0.0000036. Both values are small but the LR is large, 3611, strongly favouring H1 over H2.
Scenario 2: We draw 1100 white balls. In this case P(E | H1) = 0.000057, P(E|H2) < 0.00000001. Again both values are very small but the LR is very large, strongly favouring of H1 over H2.
(note: in both cases we could have chosen a much larger sample and got truly tiny likelihoods but these values are sufficient to make the point).
So in what sense are these two scenarios fundamentally different and why is there a problem?
In scenario 1 not only does the conclusion favouring H1 make sense, but the actual number of balls drawn is very close to the expected number we would get if H1 were true (in fact, W=100 is the 'maximum likelihood estimate' for number of balls). So not only does the evidence point to H1 over H2, but also to H1 over any other hypothesis (and there are 1000 different hypotheses W=0, W=1, W=2 etc.).
In scenario 2 the evidence is actually even much more supportive of H1 over H2 than in scenario 1. But it is essentially meaningless because it is virtually certain that BOTH hypotheses are false.
So, returning to the DNA mixture example, it is certainly not sufficient to compare just two hypotheses. The LR of one million in favour of H1 over H2 may be hiding the fact that neither of these hypotheses is true. It is far better to identify as exhaustive a set of hypotheses as is realistically possible and then determine the individual likelihood value of each hypothesis. We can then identify the hypothesis with the highest likelihood value and consider its LR compared to each of the other hypotheses.
- Confusion over the Likelihood ratio
- Problems with the Likelihood Ratio method for determining probative value of evidence: the need for exhaustive hypotheses
- Misleading DNA evidence
- Barry George case: new insights on the evidence
- Sally Clark revisited: another key statistical oversight?
- Prosecutor fallacy in Stephen Lawrence case?
- Prosecutor fallacy in media reporting of Burgess DNA case
- Flaky DNA: Prosecutors fallacy yet again
- Prosecutors fallacy just will not go away
- Misleading DNA evidence