Post #1041: Trend to 3/3/2021, and the curious case of the British lockdown.

Posted on March 4, 2021

The US fell slightly below 20 new COVID-19 cases per 100,000 persons per day.

Otherwise, there is no material change from yesterday.  The majority of states continue to see small day-to-day reductions in new cases.  The Northeast as a whole does not, due mainly to New York and New Jersey.  By contrast, large reductions in new case counts continue unabated in California.

If I were to start a new calculation, from the point at which Texas’ data reporting had recovered fully (2/27/2021), the results would look like the box in the first graph.  The Northeast is stalled, California is not, and all other regions fall in-between.

Source:  Calculated from NY Times Github COVID-19 data repository, data reported through 3/3/2021.  NE = Northeast, SA = South Atlantic, SC = South Central, MW = Midwest, MT = Mountain, P = Pacific.

A discussion of new variants and infectiousness (transmissibility)

I’m still tracking Florida.  And there’s still no change in status.

But why am I tracking Florida?  That’s the subject of this section.

As noted in yesterday’s post, if all the pieces of the story about the U.K. variant are correct (40% more infectious, U.S. incidence doubling every 10 days, accounted for 36% of Florida cases as of end-of-February), then the situation in Florida has to change soon.  Not dramatically, but if the story is fully correct, that brown Florida line should start curving upward, at an increasing rate, starting just about now.

But at this point, I should admit to being thoroughly confused about the infectiousness of different COVID-19 variants.  And, mostly, what evidence was used to conclude that a strain was more infectious.  I naively thought that if a variant “took over”, and became more prevalent, that meant that it was by definition more infectious.  I mean, how else could it “out-compete” its rivals, other than by being better at spreading?  (Plus or minus some potential accidental impact — some big super-spreader event that just happened to involve some strain.)

But not only is that not true, I now realize a) how difficult it would be to estimate that greater infectiousness from the data in the presence of large unrelated secular (seasonal) changes in the overall transmissibility of COVID-19, and b)  how thin the evidence is that is cited in the popular press for the greater infectiousness of certain strains.

This doesn’t mean that these strains aren’t more infectious.  It does mean that, for some strains at least, you’ve got a lot of claims that are based on little more than armchair epidemiology.  It’s not clear to me that every variant that has been claimed to be more infectious actually has any hard evidence to back that up.

First, how do I know that the naive notion that “if it spreads faster, it must be more contagious” doesn’t appear to be right?

The clear counter-example is the new California strainThat has taken over much of California .  (Here’s a scholarly reference that also shows high incidence of this new strain.)  But new cases continue to shrink at an almost-constant rate there.  If the new California strain is significantly more infectious, as claimed, then that would not be even remotely plausible.

Instead, if you look carefully at the evidence that is presented for the California strain being “more infectious”, it appears to be little more than anecdote.  At least in popular press presentations.  In the case of the newspaper article cited above, it was purely post hoc propter hoc.  They blamed the post-holiday increase in cases on the growth of this new strain.

(And so, it’s déjà vu all over again, as the non-existent post-holiday “surges” re-appear.  Post #1029)

But now we know that there was a strong seasonality to the virus.  Not only were cases growing strongly all across the northern hemisphere at that time, that growth ceased abruptly on or about 1/8/2021 (in the data, as reported, in the U.S., Canada, the U.K., and at roughly similar times elsewhere, such as Japan, Korea, arguably Russia, and so on).  And so, the California strain has continued to spread, but now it’s spreading as cases are falling, not rising.  The anecdote was clearly coincidence.

Second, given the strong secular (seasonal) trends, it would be difficult to get a good estimate of infectiousness from the observed data.  You would have to be careful to account for those strong seasonal trends, otherwise you’d make the same post hoc propter hoc mistake discussed above.  You’d just do it with a more sophisticated statistical model, so that nobody would be able to understand that you’d done that.

Any analysis of this issue that has been done in the northern hemisphere since, roughly, October 2020, is potentially subject to considerable error, depending on whether or not they adequately controlled for secular (seasonal) time trend.  And that’s essentially independent of methods used — whether they looked at a cross-section of areas over time (“panel data”), or got right down to contact-tracing person-level data — the correlation between the increased prevalence of a strain and “seasonality” will grossly bias the estimate of transmissibility unless there is a proper control for the seasonality of coronavirus.

There’s no way for me, as an outside observer, to understand exactly how researchers concluded that a strain is more infectious, short of having absolutely detailed knowledge of their statistical methods.  If that was derived from analysis of a cross-section of areas over time, you’d have to see that they included a time trend variable (or some other equivalent control).

And so, ultimately, this is why I’m keeping my eye on Florida.  Sure, Great Britain had a big run-up in cases, they went into lockdown, and then cases fell.  And they concluded something about transmissibility of their new COVID variant around that time.

But I want to point out one key fact that seems to have been ignored in the big picture.  Starting from the graph of seven-day moving average of new COVID-19 cases in Great Britain over the past few months:

First, Great Britain went into lockdown on 1/6/2021Per this sourceAnd this source.  (I’m double-checking that because of what I say next).

Second, new cases per day in Great Britain peaked sharply on 1/9/2021.  That’s within a day of the peak in the U.S.  And in Canada.  And within a few days of the peak in many more  northern hemisphere temperate-climate nations.

Doesn’t anybody but me see the problem here?  That’s vastly too little time for the peak to have been caused by the lockdown.

I’ve been through the issue of lags before.  Everybody (including me) keeps forgetting the long lag between the actual infection, and the reporting of that infection (see Post #989 and elsewhere).

There’s a median of five days between infection and symptom onset.  Then there would be whatever time lag occurs in Britain for getting in to see a doctor, getting tested, and having those tests reported.  And then we’re all looking at a seven-day moving average.  For the U.S., my best guess is that it takes at least 16 days for an infection to work its way fully into the data.

The upshot is that the end of that wave, in Great Britain, had nothing to do with their lockdown.  Maybe the continued downward slope did, at some point.  But the 1/9/2021 peak occurred vastly too soon for it to have been caused by the 1/6/2021 lockdown.

And so, appealing to Occam’s Razor, the end of the third wave in Great Britain likely had the same cause as in America, Canada, Japan, and so on and so on:  Seasonality.

A common methodological mistake, and an easy test for it.

Now for the $100 question:  To what extent did the British correctly account for an upward secular trend in transmissibility of coronavirus, when they concluded that their strain was 40% more transmissible than the existing strains?

As an outside observer, I have no way to tell that.  But I have a ludicrously easy way to test for it:  Whatever analysis they used to conclude that prior to 1/9/2021, re-run that analysis using data after 1/9/2021 and see if the answer changes.

The “secular trend” in the entire northern hemisphere flip-flopped on or about that date.  If the estimate of infectiousness of the new strain was biased by the secular trend, the bias will also flip-flop, starting 1/9/2021.

If their methods are robust, they can re-run them on the new data, and nothing will change.  If they are not robust, they will suddenly find that the estimate of infectiousness of the new strain, relative to existing strains, is much less.

I spent my career doing detailed analysis of health care data, both for the Federal government and for (mostly) Fortune 500 private clients.  So I’ve seen a lot of statistical analysis of observational data.  And I’ve seen a lot of bad statistical analysis of observational data.

This gets down to one of the most common procedural mistakes I have seen in statistical analyses in the social sciences:  The Stopped Clock methodology.

Most people have “an hypothesis”, meaning, they expect to see something happen.  And so, they’ll go look in the data to see whether or not that happened.  And if it did — if the find something they expect to see, where they expect to see it — they stop looking.  They declare victory and go home.

But, in fact, many methods are “stopped clocks”.  They’ll show you the same result almost regardless of what you feed into the method.  And far too frequently, I’ve seen social scientists declare that their hypothesis was correct, when in fact, all they were looking at was a stopped clock.

Lest you think this “stopped clock” problem is purely hypothetical, my final professional publication was used to prevent major changes to U.S. Medicare hospice payment.  Those changes were being pushed by the hospice industry, based on research that was, in fact, a stopped clock methodology.  I had to prove that empirically in this analysis of the evidence for hospice cost savings in the Medicare program.)

And so, researchers forget that every hypothesis really has two parts.  You should find something where you expect to find it.  That’s the part they test.  And you should not find something, where you do not expect to find it.  Every academic researcher in the social sciences does the first part and, if successful, stops.  Nobody ever bothers to check the second part.

And so, here I sit, and I wonder about the extent to which all the new variants that were discovered, in the northern hemisphere, prior to 1/9/2021, and where declared to be “more infectious”, really were.  Or whether we have inadvertently been looking at a stopped clock, owing to the strong secular trend (seasonality), so that any new strain discovered during this time would be declared to be much more infectious.

And so, absent detailed knowledge of methods, and seeing (e.g.) Great Britain clearly confound the impact of seasonality with the impact of lockdown, my attitude remains one of “show me”.

To be clear, I’m not blindly skeptical.  I’ve just done enough analysis of this type, myself, to know how easy it is to confuse a subtle cross-sectional effect with some profound secular change.  Let’s say I’m a well-informed skeptic.

And so, I’m watching Florida.  If all the parts of the story are true, then Florida should be the bellwether for the U.K. variant’s impact in the U.S.  And if that doesn’t start showing up soon, maybe we need to re-examine the evidence basis for determining the greater infectiousness of the U.K. strain.  Starting with re-running their analysis in the presence of a strongly downward secular trend starting 1/9/2021.

I’m not saying they did it wrong.  I have no idea.  I am saying I have seen this done wrong many times, in professional and peer-reviewed publications.  And that there’s an easy quick-and-dirty test to see whether or not the existing estimates of greater transmissivity were biased by strong secular trends.  Just re-run the analysis now that the secular component has reversed.