On 17 Jan 2018 multiple news sources (e.g. see here, here, and here) ran a story about a new research paper that claims to expose both the inaccuracies and racial bias in COMPAS - one of the most common algorithms used for parole and sentencing decisions to predict recidivism (i.e. whether or not a defendant will re-offend).
The research paper was written by the world famous computer scientist Hany Farid (along with a student Julia Dressel).
But the real story here is that the paper’s accusation of racial bias (specifically that the algorithm is biased against black people) is based on a fundamental misunderstanding of causation and statistics. The algorithm is no more ‘biased’ against black people than it is biased against white single parents, old people, people living in Beattyville Kentucky, or women called ‘Amber’. In fact, as we show in this brief article, if you choose any factor that correlates with poverty you will inevitably replicate the statistical ‘bias’ claimed in the paper. And if you accept the validity of the claims in the paper then you must also accept, for example, that a charity which uses poverty as a factor to identify and help homeless people is being racist because it is biased against white people (and also, interestingly, Indian Americans).
The fact that the article was published and that none of the media running the story realise that they are pushing fake news is what is most important here. Depressingly, many similar research studies involving the same kind of misinterpretation of statistics result in popular media articles that push a false narrative of one kind or another.
22 June 2018 Update: It turns out that now Microsoft is "developing a tool to help engineers catch bias in algorithms" This article also cites the case of the COMPAS software:
"...., which uses machine learning to predict whether a defendant will commit future crimes, was found to judge black defendants more harshly than white defendants."Interestingly, this latest news article about Microsoft does NOT refer to the 2018 Dressel and Fardi article but, rather, to an earlier 2016 article by Larson et al: https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm From a quick inspection it does seem to be a more comprehensive study than the flawed Dressel and Farid article. But my quick impression is that the same fundamental misunderstandings statistics/causality are there. Given the great degree of interest in AI/bias, and given also that we were unaware of the 2016 study, we plan to do an update to our unpublished paper.
Our article (5 pages): Fenton, N.E., & Neil, M. (2018). "Criminally Incompetent Academic Misinterpretation of Criminal Data - and how the Media Pushed the Fake News" http://dx.doi.org/10.13140/RG.2.2.32052.55680 Also available here.
The research paper: Dressel, J. & Farid, H. The accuracy, fairness, and limits of predicting recidivism. Sci. Adv. 4, eaao5580 (2018).
Thanks to Scott McLachlan for the tip off on this story.
See some previous articles on poor use of statistics: