Friday, 25 March 2016

Statistics of coincidences: Ben Geen case revisited (ABC)

In November 2014 I reported on the case of nurse Ben Geen who was convicted in 2006 for murdering 2 patients and seriously harming 15 others. I had been asked to produce an expert report on the 'statistical coincidences' in the case for the Criminal Cases Review Board.

Now a 30-minute documentary on the case presented by Joel Werner is to be aired on Australia's national radio station ABC on 28 March. In the programme (which you can listen to in full from the links at the top of the ABC page) I present a lay summary of the statistical argument (from minutes 16:30 to 21:34).

Norman Fenton

Saturday, 19 March 2016

Turning poorly structured data into intelligent Bayesian Network models for medical decision support

Medical data is very often badly structured, incomplete and inconsistent. This limits our ability to  generate useful models for prediction and decision support if we rely purely on machine learning techniques. That means we need to exploit expert knowledge at various model development stages. This problem - which is common in many application domains - is tackled in a paper** published in the latest issue of Artificial Intelligence in Medicine.

The paper describes a rigorous and repeatable method for building effective Bayesian Network (BN) models from complex data - much of which comes in unstructured and incomplete responses by patients from questionnaires and interviews. Such data inevitably contains repetitive, redundant and contradictory responses; without expert knowledge learning a BN model from the data alone is especially problematic where we are interested in simulating causal interventions for risk management. The novelty of this work is that it provides a rigorous consolidated and generalised framework that addresses the whole life-cycle of BN model development. The method is validated using data from forensic psychiatry. The resulting BN models demonstrate competitive to superior predictive performance against the data-driven state-of-the-art models. More importantly, the resulting BN models go beyond improving predictive accuracy and into usefulness for risk management through intervention, and enhanced decision support in terms of answering complex clinical questions that are based on unobserved evidence.

The method is applicable to any application domain involving large-scale decision analysis based on such complex and unstructured information. It challenges decision scientists to reason about building models based on what information is really required for inference, rather than based on what data is available. Hence, it forces decision scientists to use available data in a much smarter way.

**The full reference for the paper is:
Constantinou, A. C., Fenton, N., Marsh, W., & Radlinski, L. (2016). "From complex questionnaire and interviewing data to intelligent Bayesian Network models for medical decision support".Artificial Intelligence in Medicine, Vol 67 pages 75-93. DOI

For those who do not have access to the journal a pre-publication draft can be downloaded: 

Thursday, 10 March 2016

A Bayesian network to determine optimal strategy for Spurs' success

As a committed Spurs fan I have spent the last few months salivating at the club's sudden and unexpected rise and the prospect of them winning their first league title since 1961. By mid-February they were clear favourites to win the Premier League title. However, in my view, the challenge was compromised by the team becoming overstretched by playing too many matches in a short space of time. In particular, I felt that their involvement in the Europa League was an unnecessary distraction and burden. When I expressed these views on a Spurs online forum (backed up with some data showing consistent under-performance during periods when they were involved in the Europa League) I got heavily criticised by other fans who said it was important to try to win every competition.

Having simultaneously been involved in research discussions about the use of decisions in Bayesian networks, I decided to build a small model in AgenaRisk to resolve the dilemma once and for all. I have written up the results of the analysis here. The model can be downloaded from here.

In summary, there were 4 strategic options available to Spurs' manager Mauricio Pochettino at the time I started to do the analysis:
  1. Focus on Premier League 
  2. Focus on Premier League and FA Cup 
  3. Focus on Premier League and Europa League 
  4. Focus on all three competitions  
My BN model shows that the optimal decision (based on my subjective utility values of the different outcomes) was to go for 1 with 2 a close second. Unfortunately  (I believe) Pochettino opted for 3 which, as the model shows, suggests his personal utility value for winning the Europa League was actually higher than winning the Premier League.


See also: The problem with predicting football results - you cannot rely on the data