Thursday, 11 April 2013

Bayesian networks plagiarism

If, as they say, imitation is the sincerest form of flattery then we are privileged to have discovered (thanks to a tip off by Philip Leicester) that our work on Bayesian network idioms - first published in Neil M, Fenton NE, Nielsen L, ''Building large-scale Bayesian Networks'', The Knowledge Engineering Review, 15(3), 257-284, 2000 (and covered extensively in Chapter 7 of our book) has been re-published - almost verbatim -  in the following publication:
Milan Tuba and Dusan Bulatovic, "Design of an Intruder Detection System Based on Bayesian Networks", WSEAS Transactions on Computers, 5(9), pp 799-809, May 2009. ISSN: 1109-2750
The whole of Section 3 ("Some design aspects of large Bayesian networks") - which constitutes 6 out of the 10 pages - is lifted from our 2000 paper.  Our work was partly inspired by the work of Laskey and Mahoney. The authors reference that work but, of course, not ours, hence confirming the very deliberate plagiarism.

Milan Tuba and Dusan Bulatovic are at the Megatrend University of Belgrade (which we understand is a small private University) and we had not come across them before now. The journal WSEAS Transactions on Computers seems to be an example of one of the dubious journals exposed in this week's New York Times article. Curiously enough, after a colleague distributed that article yesterday I was going to write back to him saying that I disagreed with the rather elitist tone of the article, which suggests that the peer review process of the 'reputable scientific journals' was somehow unimpeachable - in reality there is no consensus on what journals are 'reputable' and even the refereeing of those widely considered to be the best is increasingly erratic and at times bordering on corrupt (which is inevitable when it relies exclusively on volunteer academics).  But at least I would hope that any 'reputable' journal would still be alert to the kind of plagiarism we now see here.

This is not the first time our work has been very blatantly plagiarised. Interestingly, on a previous occasion it was in a book that was published by Wiley Finance (who I am sure are widely considered one of the most reputable publishers). The book was 'written' by a guy who had been our PhD student for a short time at City University before he vanished without notice or explanation. The book contained large chunks of our work (none of which the 'author' had contributed to, as it predated his time as a PhD student with us) without any attribution. Despite informing Wiley of this, and proving to them that a) the author's qualifications as stated in the book were bogus; and b) the endorsements on the back cover were fraudulent, they did nothing about it.

Thursday, 28 February 2013

What chance the next roll of the die is a 3?

In response to my posting yesterday a colleague posed the following question:
The die has rolled 3 3 3 3 3 3 3 in the past. What are the chances of 1 2 4 5 6 being rolled next? The mathematician will say: P(k)=1/6 for each number, forget that short-term evidence. What will the probability expert say? And the statistician? And the philosopher? 
I have provided a detailed solution to this problem here.

In summary, it is based on a Bayesian network in which (except for the 'statistician') it all comes down to what priors they are assuming for the probability of each P(k).
  • The mathematician's prior is that the probability of each P(k) is exactly 1/6.
  •  One type of probability expert (including certain types of Bayesians) will argue that, in the absence of any prior knowledge of the die, the probability distribution for each P(k) is uniform over the interval 0-1 (meaning any value is just as likely as any other).
  • Another probability expert (including most Bayesians) will argue that the prior should be based on dice they have previously seen. They believe most dice are essentially 'fair' but there could be biases due to either imperfections or deliberate tampering. Such an expert might therefore specify the prior distribution for P(k) to be a narrow bell curve centred on 1/6.
  •  A philosopher might consider any of the above but might also reject the notion that 1,2,3,4,5,6 are the only outcomes possible.
Anyway, when we enter the evidence of seven 3's in 7 rolls, the Bayesian calculations (performed using AgenaRisk) result in an updated posterior distribution for each of the P(k)s.

The mathematician's posterior for each P(k) is unchanged: i.e. each P(k) is still 1/6.So there is still just a probability of 1/6 the next roll will be a 3.

For the probability expert with the uniform priors, the posterior for P(3) is now a distribution with mean 0.618. The other probabilities are all reduced accordingly to distributions with mean about 0.079. So in this case the probability of rolling a 3 next time is about 0.618 whereas each of the other numbers has a probability about 0.079

For the probability expert with the bell curve priors, the posterior for P(3) is now a distribution with mean 0.33. The other probabilities are all reduced accordingly to distributions with mean about 0.13. So in this case the probability of rolling a 3 next time is about 0.33 whereas each of the other numbers each has a probability about 0.13.

And what about the statistician? Well a classical statistician cannot give any prior distributions so the above approach does not work for him. What he might do is propose a 'null' hypothesis that the die is 'fair' and use the observed data to accept or reject this hypothesis at some arbitrary 'p-value' (he would reject the null hypothesis in this case at the standard p=0.01 value). But that does not provide much help in answering the question. He could try a straight frequency approach in which case the probability of a three is 1 (since we observed 7 out of 7 threes) and the probability of any other number is 0.

Anyway the detailed solution showing the model and results is here. The model itself - which will run in AgenaRisk is here.

Wednesday, 27 February 2013

"No such thing as probability" in the Law?

David Spiegelhalter has posted an important article about a recent English Court of Appeal judgement in which the judge essentially suggests that it is unacceptable to use probabilities to express uncertainty about unknown events. Some choice quotes David provides from the judgement include:
"..and to express the probability of some event having happened in percentage terms is illusory.
....The chances of something happening in the future may be expressed in terms of percentage. ... But you cannot properly say that there is a 25 per cent chance that something has happened... Either it has or it has not. "
What is interesting about this is that the judge has used almost the same words that we said (in- Chapter 1 of our book Risk Assessment and Decision Analysis with Bayesian Networks) we had heard from several lawyers. One of the quotes we gave there from an eminent lawyer was:
“Look the guy either did it or he didn’t do it. If he did then he is 100% guilty and if he didn’t then he is 0% guilty; so giving the chances of guilt as a probability somewhere in between makes no sense and has no place in the law”. 
Of course, as we show in the book (Chapter 1 is freely available for download) you can actually prove that the this kind of assertion is flawed in the sense that it inevitably leads to irrational decision-making.

The key point is that there can be as much uncertainty about an event that has yet to happen (e.g. whether or not your friend Naomi will roll a 6 on a die) as one that has happened (e.g. whether or not Naomi did roll a six on the die). It all depends on what information you know about the event that has happened. If you did not actually see the die rolled in the second case your uncertainty about the outcome is no different than before it was rolled, even though Naomi knows for certain whether or not it was a six (so for her the probability really is either 1 or 0). As you discover information about the event that has happened (for example, if another reliable friend tells you that an even number was rolled) then your uncertainty changes (in this case from 1/6 to 1/3). And that is exactly what is supposed to happen in a court of law where, typically, nobody (other than the defendant) knows  whether the defendant committed the crime; in this case it is up to the jury to revise their belief in the probability of guilt as they see evidence during the trial.

David Spiegelhalter points out that the judge is not just 'banning' Bayesian reasoning, but also banning the Sherlock Holmes approach to evidence. But it is even worse, because the judge is essentially banning the entire legal rationale for presenting evidence (which is ultimately about helping the jury to determine the probability that the defendant committed the crime).

p.s. There are other aspects of the case which are troubling, notably the assumption that there were just three possible potential causes of the fire (other as yet unknown/unknowable potential causes would have non-zero prior probabilities). However, the judge got some things right including his line of reasoning about the relative likelihood of two unlikely events (the arcing or the smoking) demonstrated that, if these are exhaustive, then the smoking was the most likely cause. 

Tuesday, 15 January 2013

Who is the appropriate expert here: a DNA specialist or a probability specialist?

An interesting issue about expert evidence has arisen in a case on which I am providing input. It can be summarized as follows:

If a DNA expert makes an incorrect probabilistic inference (such as a logical or computational error) arising from a DNA probability, is it appropriate for a probability expert to point out the error or is only a DNA expert qualified to point out the error?

According to many lawyers only a DNA expert is qualified. I believe this is fundamentally wrong, as the following real (but anonymized) example demonstrates:

A partial DNA sample found at the scene of the crime (containing only two clearly identifiable components)  matches the defendant's DNA.  The DNA expert (who we will refer to as expert A) concludes:
"the probability this DNA comes from anybody other than the defendant is very unlikely". 
A probability expert (expert B) believes that the DNA expert's conclusion may be highly misleading; Expert B asks an independent DNA expert (expert C) to check the DNA evidence and provide a match probability. Expert C confirms a two-component match and asserts that the match probability is about 1 in a 100, i..e. the probability of finding such a match in a person not involved is 1 in 100. Expert B uses this information to explain why expert A's statement was misleading as follows:
Expert A is making a statement about the probability of the defendant not being the source of the DNA, when all that expert A can actually conclude is that the probability of getting such a DNA match if the defendant is not the source is 1 in 100. If the term 'very unlikely' is a surrogate for the more precise "1 in 100 probability" then the expert is making the transposed conditional error (prosecutors fallacy). Specifically one cannot make any conclusions about the (posterior) probability of the defendant being or not being the source without knowing something about the prior probability - i.e. without knowing how many other people could have left the DNA sample at the scene. If, for example, there are 1000 people who have not been ruled out then about 10 of these would have the matching partial DNA. In that case "the probability this DNA comes from anybody other than the defendant is about 90%" - which is very different from expert A's conclusion. 
A lawyer rejects expert B's contribution because it is "outside his area of expertise", stating the following:
Expert B is not an expert on DNA - as is proved by the fact that he had to ask another DNA expert (C) to come up with the relevant random match probability - and so is not qualified to comment on DNA evidence.Only a DNA expert can comment on the likelihood that the DNA comes from the defendant.

But expert B is NOT commenting on the DNA evidence. Expert B is commenting on a logically incorrect - and unnecessarily vague - probabilistic inference made by a person who happens to be a DNA expert. In fact the only person here who is venturing outside their area of expertise is expert A because he/she has made an assumption about something which he/she has no information or expertise - namely the number of people who could potentially have been at the scene of the crime.

The logical extension of the lawyer's argument would be to reject all logical, mathematical and statistical analysis about a problem X if it is not presented by a person who is an expert in problem X.

Saturday, 17 November 2012

Why machine learning (without expert input) may be doomed to fail

With the advent of ‘big data’  there has been a presumption (and even excitement) that machine learning, coupled with statistical analysis techniques, will reveal new insights and better predictions in a wide range of important applications. The perception is being reinforced by the impressive machine intelligence results that organisations like Google and Amazon routinely provide purely from the massive datasets that they collect.

But for many critical risk analysis problems (including most types of medical diagnosis and almost every case in a court of law) decisions must be made where there is little or no direct historical data to draw upon, or where relevant data is difficult to identify. The challenges are especially acute when the risks involve novel or rare systems and events (e.g. think of novel project planning, predicting events like accidents, terrorist attacks, and cataclysmic weather events). In such situations we need to exploit expert judgement. This latter point is now increasingly widely understood. However, what is less well understood is that, even when large volumes of data exist, pure data-driven machine learning methods alone are unlikely to provide the insights required for improved decision-making. In fact more often than not such methods will be inaccurate and totally unnecessary.

To see a simple example why, read the story here.

Tuesday, 23 October 2012

The impact of multiple possible test results on disease diagnosis

In our new book, we cite the famous Harvard Medical School experiment where doctors and medical students were asked the following question.

"One in a thousand people has a prevalence for a particular heart disease. There is a test to detect this disease. The test is 100% accurate for people who have the disease and is 95% accurate for those who don't (this means that 5% of people who do not have the disease will be wrongly diagnosed as having it)."

The answer (as explained here) is a bit less than 2% (which is interesting because most of the people in the study gave the answer as 95%).

Now a reader has posed the following problem:

 I understand the 2% answer of a random person having the heart disease. What is the probability of that person having the disease if a 2nd test comes back positive? What about a 3rd test?

I have added this as an exercise to Chapter 6 of the book and provided a full answer - using a Bayesian network. In summary, if the tests are really independent (with the same level of accuracy) then if  two tests are positive the probability of the disease rises to 28.592%; and when three tests are positive it rises to 88.899%.
However, the tests can be dependent on each other and it is also possible that there are some personal features of the patient that lead to common test errors. These two situations (with particular assumed prior probabilities of the dependencies leads to a far lower probative value of the multiple positive test results. As shown in the solution, we get:

When the tests are directly dependent and two are positive the probability of disease only increases to 3.229%; with all three positive the probability only increases to 4.002%.

When there is a common source of error and two tests are positive the probability of disease increases to 13.805%; with all 3 tests positive, probability increases to 64.023%.

Tuesday, 19 June 2012

Prosecutor fallacy again in media reporting of David Burgess DNA case

There are numerous reports today of the trial of David Burgess accused of killing Yolande Waddington in the 1960s. Burgess had been ruled out as a suspect at the time of the crime because his blood was found not to match that of blood on a sweater belonging to Yolande. However, new DNA analysis has found that Burgess's DNA profile does match that of blood found on a sack that was at the crime scene.

Following on from previous blog postings it is clear that again reporters are making the prosecutor fallacy even though it appears not to have been made in court (but as I explain below I believe that other errors were made in court). For example, the Sun provides a classic example of the prosecutor fallacy:
Scientists said the chances of the DNA on the sack not belonging to him were less than “one in a billion”. 
However, both the Mail and Guardian report what I assume were the actual words used by the forensic scientist Mr Price in court:
... the probability of obtaining this result if it is due to DNA from an unknown person who is unrelated to David Burgess is smaller than one in a billion, a thousand million.
Ignoring the fact that all kinds of testing/cross contamination errors have not been factored in to the random match probability of one in a billion, then there is nothing wrong with the above statement because if we let:

  • H be the hypothesis "DNA found at scene does not belong to defendant or a relative".
  • E be the evidence "DNA found is a match to defendant".
Then the probability of E given H (which is what is stated in the above quote) is indeed one in a billion.

But what is VERY interesting is that Burgess was ruled out in the original investigation because his blood type did NOT match the sample from the scene. To explain the 'change' we get the following quote in the Guardian article:
"Mr Price said the initial test on the bloodstained sweater may have been flawed and that the difference between Burgess’s blood type and that found on the sweater could be due to a mistake in the process that was known to occur sometimes."
In other words the forensic scientist claims that the lack of a positive blood match first time round is "due to a mistake in the process" but he appears never to consider the possibility of any mistake in the process leading to a positive DNA match. Perhaps conveniently for the CPS it appears the sweater has somehow got 'lost'  (curious how that crucial crime scene item should vanish whereas the sack which was never tested orginally should remain) so there was no attempt to test the DNA of the blood on the sweater.

If the latest ruling from the USA is anything to go by, there is going to be even less less chance of questioning the accuracy of DNA and other types of forensic analysis in future.