Probability and Risk: We still are not getting the most basic data needed about COVID-19 testing

UPDATE: This follow up article addresses some of the issues raised here.

As I have been arguing regularly on this blog, there is no point in citing numbers of COVID-19 "cases" (which are not necessarily people 'ill' or even with symptoms, but simply the number of tests that are positive) without also citing the number of tests performed.

The (Government) decision about whether to move a region into a higher (resp. lower) tier is based on whether the number of "cases" goes above (resp. below) a certain threshold number per 100,000 residents. But, as there are wide variations in the number of people per 100,000 who are tested, this strategy is ludicrous. Because (since July) many (most?) people tested do not have any symptoms and (because of the various reasons why there may be false positives), the more people you test the more "cases" you will find. Hence, a region which has a genuinely low infection rate, but a disproportionately high number of tests, may find itself in Tier 3 lockdown; conversely, a district with a genuinely high infection rate, but a disproportionately low number of tests, may find itself in the relative 'freedom' of Tier 1. [Update: as David Paton notes, the ZOE symptom app data also suggests the decisions about Tiers is irrational]. https://twitter.com/cricketwyvern/status/1338843277232115713?s=20

The Government website does provide the national figures for daily number of tests, which means that I can produce plots such as these which show why it is so important to consider the number of tests conducted instead of simply citing number of "cases".

Number of cases increase when numbers tested increase (but note the axes have different scales, but I have clarified this with different colours)


With the increase in more random testing, the blue line is more informative of the national COVID19 status than the red line, but it is the latter not the former, driving policy decisions

As soon as we produce such plots it becomes clear that the scale of the 'second wave crisis' has been massively exaggerated (see yesterday's post for more upated plots that factor in numbers tested) and that there is merit to the claim that what we have is a 'casedemic'.

Yesterday I decided to try to do similar analyses per region (i.e. taking account of the number of daily tests per 1000 residents) to see whether it was the numbers tested that was driving, for example, the decision to move certain regions (like London) to Tier 3. However, curiously the data are not available even though it is supposed to be. [18/12/20 UPDATE: the regional testing data is now available] While the Government website gives the overall national testing figures, the same website fails to give the figures for individual regions, even though that it appears to offer exactly this option at https://coronavirus.data.gov.uk/details/download.

That page invites you to choose and download a whole range of testing data for any region. But no matter which daily testing data you select those particular fields (but not, e.g. "daily cases" and "daily deaths") always come up empty when you download the file [UPDATE: the page is now no longer even allowing you to select to download by different regions].

I also tried searching the websites/dashboards provided by different regions/local authorities themselves. Some of those I found, such as Buckinghamshire and London provide a lot of very useful data (including the daily number of people reporting symptoms - see screenshots below), but curiously they also do not have the daily testing numbers.

So, given that millions of people's lives are adversely affected by the decisions made based on these numbers, the fact that they are being hidden is becoming increasingly disturbing. If the numbers tested per region are consistent (i.e. all regions have a similar number of tests per 1000 residents) then there is not a problem when it comes to regional decisions. In that case, let's see the data showing this consistency. If they are not consistent (as I suspect) then we are all being conned.

One final point: people have been arguing with me that my concerns are 'wrong' because what matters is that, for example, in London "COVID19 hospital admissions" and "deaths" are rising. But notwithstanding the fact that overall autumn/winter rises are inevitable, the problem is that the data on number of hospital admissions and deaths attributed to COVID19 suffer from the same problems as COVID19 "case" data: we actually have no idea at all how many of those classified as "having COVID19" really do have a virus that causes hospitalization and death. What we do know is that a very large proportion of those classified as COVID19 hospital admissions were admitted for something other than COVID19 but tested positive for it after admission. Similarly, the death numbers are based on anybody who had a positive test within 28 days of death irrespective of the actual cause. All this means is that as testing numbers increase, so inevitably will the number of hospital admissions and deaths even if some (or even most) have nothing to with COVID19.

London COVID19 dashboard, 15 Dec 2020. Note that there is little here to explain the reason for the decision to move London to Tier 3. https://data.london.gov.uk/dataset/coronavirus--covid-19--cases

Buckinghamshire dasboard, 15 Dec 2020. https://covid-dashboard.buckinghamshire.gov.uk/

See: All COVID articles on this blog

2 comments:

Adrian Dickenson16 December 2020 at 07:06
FWIW I've tried to access the daily test data (newPillarTwoTestsByPublishDate) using the R API for the government stats and it's giving just N/A for anything other than nation level areas.

Keep up the excellent work!
Jone25 February 2021 at 13:12
Absolutely brilliant site - Thank you.

Tuesday, 15 December 2020

We still are not getting the most basic data needed about COVID-19 testing

2 comments: