Just discovered the statement below on the UK Government website
(I’d totally forgotten about it). It’s the evidence we provided 14
months ago concerning data transparency and accountability during the
COVID-19 crisis. A pdf version is here.
Shame the Government totally ignored the evidence.
Response to the Call for Evidence
Regarding COVID-19 Data
Transparency and Accountability
UK Parliament, Public Administration and Constitutional Affairs Committee,
Commons Select Committee
Director of Risk Information Management Research Group
Professor in Computer Science and Statistics
in Health Informatics (Queen Mary)
Fellow in Law
(Birmingham Law School)
Administration and Constitutional Affairs Committee
Transparency and Accountability: COVID-19.
Apropos the call for evidence
concerning data transparency and accountability during the COVID-19 crisis.
We are a group of senior researchers
in risk assessment, probability, statistics, and public health technologies
based at Queen Mary University of London. Since March 2020, we have produced 23
which 5 have been published in peer reviewed journals) analysing the publicly
available COVID-19 statistics and producing risk assessments and models.
We believe that the statistics
provided to and by the Government during the COVID-19 crisis have been inadequate and have been too
easily used by influencers and decision-makers to fit particular narratives
that have exaggerated the scale of the crisis.
Statistics and data are observed
phenomena arising from unobserved processes and their interactions (including
causal explanations) as shown in Figure 1. The number of observed COVID-19 ‘cases’ clearly depends
on how a ‘case’ is defined and the population infection rate, but it is also influenced
by many (normally unreported) causal factors such as how many tests are being
performed, who is being tested and why, and the accuracy of the testing. Similarly,
while the number of observed COVID-19 ‘deaths’ clearly depends on how a COVID-19 death is defined and reported, it is also influenced
by the population demographics, quality of healthcare etc. Hence, contrary to
popular conception, data do not ‘speak for themselves’.
For example, in March and April (as
we pointed out in,,),
by focusing only on simple counts of ‘cases’, ‘hospitalisations’ and ‘deaths’,
the public was misled into believing that the virus was more deadly than it
really was. At that stage testing was essentially limited to those who were
either already hospitalized with severe symptoms or were frontline healthcare
workers. The reported high death (and hospitalisation) rates of those infected
(calculated by simply dividing the number of deaths by the number of ‘cases’)
were in part explained by the limited testing regime that was essentially only
‘finding’ the most severe ‘cases’.
Similarly, the scale of the ‘second
wave’ has been continually exaggerated by focusing on increased ‘cases’ without
considering the simple causal explanation of massively increased testing. When
this is done – as shown in the plots of Figure 2 –the trends for cases, hospitalizations and deaths
look far less worrying than those presented at https://coronavirus.data.gov.uk using exactly the same data.
At the root of the data problem
there has been a fundamental misunderstanding about the meaning of terms ‘COVID-19
cases’ and ‘COVID-19 deaths’, and what can be interpreted from statistics that
use these terms. Even small changes in
how these are defined and classified (as has happened several times since
March) lead to very different trends and conclusions.
The definition of a COVID-19 ‘case’
is especially concerning. In epidemiology, a case definition includes criteria
for person (e.g. gender, race, age, or exclusion criteria), place
(such as that associated with the outbreak of a disease), time (when
illness started) and clinical features. Clinical features are
initially normally simple and objective such as ‘sudden onset of fever and
cough’ but should later be characterised by confirmed presence of specific
laboratory findings, such as ‘ground-glass opacity on Chest CT and positive
culture for SARS-CoV-2’. During this crisis, a positive PCR test has
improperly become the surrogate replacing all four aspects of case definition.
A PCR test may be positive: (i) before clinical features arise; (ii) long after
clinical features have abated; or even (iii) when a person has simply come into
contact with the disease but without them ever becoming infected. Some argue
that reporting cycle threshold (Ct) values may help clinical decision-makers
identify at which of these three stages an asymptomatic person may present;
however, given that almost all so-called asymptomatic cases never develop
active disease, if we leave aside issues with false positives (which increase
for high Ct values), we submit that many ‘cases’ must be type (iii) and
therefore did not meet the normal epidemiological standard to be classified or
counted as a case.
Confusion about the definition of a
COVID-19 ‘death’ also persist. It is now clear that Government-reported deaths include
not just those who died as a direct result of the disease, but also all of
those who have died ‘with it’, thus leading to inflation of the fatality
figures. Several studies have also suggested that reported deaths from other
pneumonias, influenzas and even lung cancer have dropped well below normal
annual levels since March. As such there are questions surrounding whether
people who died of these similar conditions were incorrectly classified as COVID-19
With the massive increase in
testing since August, uncertainty about the testing accuracy - especially the false positive rate of PCR
means that almost nothing meaningful can be concluded about the increasing
cases or fatality rate – see Figure 3. The vast majority tested have no symptoms at
all, so in the absence of data provided about the proportion of asymptomatic
people who were tested and tested positive (as well as the other missing
information shown in Figure 3), we do not know what
proportion of new ‘cases’ and reported ‘deaths’ are people infected with COVID-19
at all. A false positive rate of even
just 1% would, together with the massively increased testing, provide a causal
explanation for the increase in cases even if the virus has largely subsided.
But, yet again, the narrative presented – and the one on which lockdown
decisions are based – is that of a massive ‘second wave’.
loop thinking means that, once a particular narrative is ‘believed’,
alternative explanations for the observed data are never entertained. Indeed,
the lack of data, unscientific closed-shop models, fundamental misunderstandings
by decision-makers, manipulation of underlying reporting processes,
contradictory goals or the potential for malign intent are all feasible
explanations for the observed data and chaotic analysis. The lack of data
transparency gives credence to these explanations and leads to a lack of trust
in government statistics and decisions made using those statistics.
are many examples of how the crude data, and failure to consider alternative
causal explanations, has been used for inappropriate decision-making and even
scare-mongering. These include:
Using ‘100 new cases per 100,000 people’
as a threshold beyond which a local borough is required to move to lockdown.
With this metric the threshold can be avoided or reached simply by
decreasing/increasing the number of tests carried out.
As explained above (and in Figure 2) the headline
figures and graphs – as presented for example, at https://coronavirus.data.gov.uk/ do not factor in
the increase in testing. For example, the recent ‘exponential’ increase in
number of cases – which has driven the ‘second wave narrative’ does not look at
all serious when we plot it as number of cases per 1000 tests. The same is true
of hospital admissions and deaths; for example,
contrary to the frightening ‘absolute’ increase in hospital cases since
September, it turns out that the number of hospital admissions per 1000 cases
has remained stable – and may even be decreasing when we factor in the false
positives and those admitted for non-COVID reasons who happen to get a positive
COVID test after admission.
The ONS report
on COVID-19 deaths by ethnicity is one of many that have produced misleading
conclusions without even revealing all relevant data. This particular report exaggerated
the increased risk to people from the BAME community by using ‘relative risk’
to summarise the findings, rather than ‘absolute risk’ as continually recommended for communicating risk to the
public, by Royal Statistical Society Chairman (and member of SAGE) Professor
Sir David Spiegelhalter.
Moreover, we noted,
that the claims were almost certainly further exaggerated as they were likely
based on out of date demographic information (the ONS failed to respond to our
request to identify what data were used). Hence the ONS report – which was
widely quoted in the media – was likely to create an unjustified level of fear
and anxiety among the BAME community. Failure to identify causal explanations
for data bias has also led to multiple well-publicised studies with exaggerated,
- or even flawed - claims that certain communities, or people
with certain attributes or habits, are at much higher risk of COVID-19.
In early October news broke of under-reporting
of almost 16,000 positive PCR tests and that, as a result, as many as 48,000
people may not have been informed of their exposure due to close contact with
these undisclosed ‘cases’.
PHE blamed Microsoft’s Excel software,
but this disingenuous admonition did more to highlight PHE’s: (a) reliance on
almost 25yr old technology; (b) ignorance of and failure to maintain pace with
technology; and (c) lack of any reliable approach to checking and validating data
they collect and report. Data security experts describe this as one in a long string
of data and information security failings by PHE and the Government and have
used it to support eschewing use of the proposed NHSx track and trace apps.
Removal or sanitising of flu
incidence/death data from 1999 and all previous years from the ONS website
making comparisons almost impossible and giving the impression that the ‘past
is being rewritten or expunged’.
Constant changing of scales and
metrics used in data reporting. For example, deaths were recorded as COVID-19 deaths
if they occurred within 28 days of a positive test and this has recently been
changed to 60 days if COVID-19 appears on the death certificate. This change
was done in reaction to a recommendation that the period should be reduced to
21 days. The change was made with no accompanying explanation of why it was increased rather
the only way to achieve accurate estimates of the critical population infection
rate at any given time is to provide the missing – but easy to obtain data
shown in Figure 4.
about lockdown require data to support the evidence shown in Figure 5. If these data
have been considered in Government decisions, they have certainly not been made
In summary, and supported by the arguments
above our responses to the eight issues identified in the public call are:
response to Issue 1: Did the Government have good enough data to make
decisions in response to Coronavirus, and how quickly were the Government able
to gather new data?
Data provided by
several departments including PHE, NHS and ONS for Government decision-making
was observed to be ever-changing, unreliable, and of such poor quality and so inappropriately
framed as to be insufficient to support the public health, policy and
legislative decisions that resulted.
response to Issue 2: Was data for decision-making sufficiently joined up
actors are responsible for collecting and reporting data that will be
aggregated and used to direct public policy: definitions, thresholds and
processes must observe a consistent standard. The central aggregator, in this
case the ONS, should have been responsible for both dictating and enforcing that
As evidenced by
anomalies and misrepresentations identified above, the efforts of PHE, NHS and
ONS were not sufficiently joined up, fell short of due standards, and severely
undermined Government decision-making, independent scrutiny, and ultimately public
response to Issue 3: Was relevant data disseminated to key decision-makers
in: Central and Local Government; other public services (like schools);
businesses; and interested members of the public?
To be relevant,
data must be capable of informing the decision-making process. Relevant data is
that which is accurate, timely, indisputable, optimised
and fit to inform the known purposes for which it may be used.
Government was aware most members of the public consume only limited ‘views’ of
such data as are presented in the media, its presentation, accuracy and fitness
for purpose should have received greater consideration. While data was made
available via the ONS website, for the reasons discussed above the relevance of
this data has remained questionable.
response to Issue 4: Were key decisions (such as ‘lockdowns’) underpinned by
good data and was data-led decision-making timely, clear and transparently
presented to the public?
decisions impacting the liberty and freedoms of individuals appear to have been
made haphazardly. While each came supported by
justifications, it was claimed, that they were ‘led by the science’, more often
it could be argued this was not the case.
has been presented as being the result of “the science” with the goal of
delivering ‘consensus’. However, science does not operate as a consensus making
mechanism and it is not monolithic. The current crisis has demonstrated that
groups like SAGE and the Joint Biosecurity Centre are not following scientific
norms of behaviour. Analysis and policy formulation need more stringent oversight
in a way that invites and delivers scientific debate from both within and also
outside the group.
response to Issue 5: Was data shared across the devolved administrations and
local authorities to enable mutually beneficial decision-making?
If this were the
case it has not been made clear to the public, and in any case, it is likely
that the shared data suffered from all of the limitations we have highlighted.
response to Issue 6: Is the public able to comprehend the data published
during the pandemic. Is there sufficient understanding among journalists and
parliamentarians to enable them to present and interpret data accurately, and
ask informed questions of Government?
It is difficult to
ensure accurate comprehension in circumstances where, as discussed earlier,
relevantly framed data has not been provided. Continued reliance on journalists
to identify meaning from data has only resulted in sensational headlines that
amplified public ignorance and promulgated fear.
What could have
been done to improve understanding and who could take responsibility for this?
The current crisis
has demonstrated Government must take additional steps to provide context and
meaning capable of supporting differing interpretations they wish the public
should draw from published data. The public should be trusted to understand
nuance and scientific disagreement about what the data might be telling them.
response to Issue 7: Does the Government have a good enough understanding of
data security, and do the public have confidence in the government’s data
The policies and
approaches of Government do not seem to have reflected prevailing opinions and
wishes of the public. This has never been more obvious than during development
and release of both versions of the NHSx Track and Trace smartphone app, and
when vision-based population proximity monitoring AI systems were deployed around
London suburbs, and once exposed in the media, hastily removed. It seems there
is little public confidence in the current approach to securing public and
personal data and indeed the potential for increased suspicion of the
government’s motives in this regard.
response to Issue 8: How will the change in responsibility for Government
data impact future decision-making?
It is not clear
what the change in responsibility is and the motivation for it. Any change in
responsibility might simply be akin to ‘rearranging the deckchairs on the
Figure 1 Causal model explaining
Figure 2 Simple plots that take
account of number of tests and cases
Figure 3 Why the daily reported
data tell us almost nothing
Figure 4 Missing data needed to
accurate estimation of population infection rate
Figure 5 The evidence we need to
demonstrate why lockdowns are needed