Contrary to the narrative being sold by the big data community, if you want accurate predictions and improved, decision-making then, invariably, you need to incorporate human knowledge and judgment. This enables you to build rational causal models based on 'smart' data. The main objections to using human knowledge - that it is subjective and difficult to acquire - are, of course, key drivers of the big data movement. But this movement underestimates the typically very high costs of collecting, managing and analysing big data. So, the sub-optimal outputs you get from pure machine learning do not even come cheap.
To clarify the dangers of relying on big data and machine learning, and to show how smart data and causal modelling (using Bayesian networks) gives you better results, I have collected together the following short stories and examples:
- A short story illustrating why pure machine learning (without expert input) may be doomed to fail and totally unnecessary (2 page pdf)
- Another machine learning fable: explains why pure machine learning for identifying credit risk may result in perfectly incorrect risk assessment (1 page pdf)
- Moving from big data and machine learning to smart data and causal modelling: a simple example from consumer research and marketing (7 page pdf)
- A Bayesian Network for a simple example of Drug Economics Decision Making (4 page pdf)