Wednesday 11 December 2019

Problems with DNA mixed profile evidence: the case of Florencio Jose Dominguez

I have written many times before about the potential problems when using the likelihood ratio (LR) as a measure of probative value of evidence. The problems are especially acute when the evidence consists of a tiny sample of DNA for which there are at least two people contributing - often referred to as a low template mixed DNA profile. Over the last year I have been working with lawyer Matthew Speradelozzi on a case in San Diego that challenged the use of new statistical analyses for such a mixed profile. The case was settled Friday when Florencio Jose Dominguez (who was sentenced to 50 years to life for a 2008 murder) was released after pleading guilty to a reduced charge.

The major controversy involves what is called probabilistic genotyping software (in this case STRmix from ESR) that claims to be able to analyse low template mixtures and determine the most likely contributing profiles by taking account of information like the relative peak heights at loci on the electropherogram (epg), which is the graph that DNA analysts use to decide which components (alleles) are present in a sample. The DNA analysts first determine the number of contributors there are in the mixture and then provide a LR that compares the probability of the evidence assuming the suspect is one of the contributors against the probability of the evidence assuming the none of the contributors are related to the suspect. While the probabilistic genotyping software can be effective if the ‘size’ of the different contributors is very different, it is much less effective when it is not (as with Dominguez who was claimed to be one of at least two unknown contributors of a similar ‘size’). Moreover, in contrast to single profile DNA cases, where the only residual uncertainty is whether a person other than the suspect has the same matching DNA profile, it is possible for all the genotypes of the suspect’s DNA profile to appear at each locus of a DNA mixture, even though none of the contributors has that DNA profile. In fact, in the absence of other evidence, it is possible to have a very high LR for the hypothesis ‘suspect is included in the mixture’ even though the posterior probability that the suspect is included is very low. Yet, in such cases a forensic expert will generally still report a high LR as ‘strong support for the suspect being a contributor’, which is potentially highly misleading. We have submitted a paper describing this and many other issues relating to the reliability of probabilistic genotyping software and will report on it here in due course.

ESR have issued their own statement.

See also

Friday 6 December 2019

Simpson's paradox again have a post about our paper on Simpson's paradox (we wrote this in 2015 but only just uploaded it to arxiv). The full paper is here.

The paradox is covered extensively in both “The Book of Why" by Pearl and Mackenzie (see my review) and also David Spiegelhalter’s “The Art of Statistics: How to Learn from Data”(see my review). Speigelhalter's book contains a particularly good example of Cambridge University admissions data:

Overall the acceptance rate was higher for men and than women, but in each subject the rate was higher for women than men. This is explained by the observation that women were more likely to apply for those subjects where the overall accepance rates were lower. In other words the relevant causal model is this one:

See also: Doctoring Data