With the advent of ‘big data’ there has been a presumption (and even excitement) that machine learning, coupled with statistical analysis techniques, will reveal new insights and better predictions in a wide range of important applications. The perception is being reinforced by the impressive machine intelligence results that organisations like Google and Amazon routinely provide purely from the massive datasets that they collect.
But for many critical risk analysis problems (including most types of medical diagnosis and almost every case in a court of law) decisions must be made where there is little or no direct historical data to draw upon, or where relevant data is difficult to identify. The challenges are especially acute when the risks involve novel or rare systems and events (e.g. think of novel project planning, predicting events like accidents, terrorist attacks, and cataclysmic weather events). In such situations we need to exploit expert judgement. This latter point is now increasingly widely understood. However, what is less well understood is that, even when large volumes of data exist, pure data-driven machine learning methods alone are unlikely to provide the insights required for improved decision-making. In fact more often than not such methods will be inaccurate and totally unnecessary.
To see a simple example why, read the story here.