Post #1092: COVID-19 fourth U.S. wave, clear as mud

Posted on April 3, 2021

So far, the US fourth wave of COVID has almost no coherence to it.  The U.S. as a whole had some upward trend in cases, as of a few days ago.  But that’s from the individual states and regions going their own separate ways.

Source for this and other graphs:  Calculated from The New York Times. (2021). Coronavirus (Covid-19) Data in the United States. Retrieved 4/3/2021, from  The NY Times on-line COVID-19 tracking page may be found at:

And so the short-term good news is that new case counts have flattened out for the past few days.  And Michigan seems to be getting a little respite in its outbreak.

A little statistical analysis

The current story is that we’re in a race between the more infectious COVID-19 variants, on the one hand, and vaccination, on the other.  The problem is, it sure doesn’t much look like it, from the data that you can obtain publicly.

So let me do a little quick-and-dirty statistical analysis of state data.  Maybe there are things that you can infer from a more formal analysis that are not visible to the naked eye.

Here are the data elements in play.

Percent change in daily COVID-19 cases per capita, comparing April 2 to March 1 2021.  This tells you which states have improved or have gotten worse over that time, in terms of incidence of COVID-19.  That’s from the NY Times data cited above.

Fraction of cases that are more-infectious variants, per the CDC, for a subset of states, as of end-of-February 2019.  From CDC.

Number of COVID-19 vaccine doses per capita, as of 4/2/2021, from CDC.

Cumulative total COVID-19 infections per capita as of 4/2/2021, from the NY Times data cites above.

Average state annual temperature, from world population review.

The only one of those that isn’t available for all 50 states is the fraction of cases that are the more-infectious variants.  CDC only publishes that for states where it has an adequate sample of tests that have been DNA-sequenced.  Anything involving that factor is only available for 17 states.  The rest are available for 51 (50 states plus Washington DC).

First, let me just see if there’s any correlation at all between what has happened to each state over the last month (in terms of new COVID case load) and these other factors.

In the table below, a large positive number means that this factor was associated with increases in COVID-19 daily new cases (or, maybe, smaller decreases).  A large negative number  means the factor was associated with decreases in COVID-19 new cases (or, maybe, smaller increases).

Anyone who has ever done statistical analysis knows just how crude this is.  And, in case you’ve never heard it said before, correlation is not causation.  That said, if there is some overwhelmingly large impact (some really strong causation at work), chances are that would generate some correlation.

And the results are more-or-less garbage.  In so far as you can tell from a simple one-on-one correlation:

  • The U.K. variant has no significant effect.
  • The California variant is associated with slower rates of growth in new cases.
  • Vaccine penetration was associated with faster rates of growth in new cases.
  • States that had a lot of infections in the past had slower growth in new cases.
  • Warm-climate states had slower growth in new cases.

As is the way with statistical analysis, these are average effects.  It doesn’t say that (e.g.) every warm climate state had slower growth.  It just says that, on average the warmer-climate states as a group had slower growth.

All of the above is more-or-less what’s been clear, by eye, just by looking at the data.  Florida, California — not much is happening there.  In fact, the three states with the highest proportion of the California variant have all done well.  But most of the colder parts of the country (the Northeast and the Midwest) are seeing rising rates and outbreaks.  And all you have to do is look at Florida to realize that if there’s some huge U.K. variant impact, it sure seems to be failing to budge the Florida new case counts.

I would say something about herd immunity, in that states with lots of prior infections appeared to do well.  But in fact, the prior-infection and current-vaccination numbers point in completely different directions.  And if herd immunity were taking hold, those would point in the same direction.  So you really have no clue what you’re looking at.  Presumably, the vaccination number is a result of small, rural, and largely cold-climate states taking the lead in getting their populations vaccinated (e.g., Alaska, North Dakota).

Arguably, with 50 states, I might have enough data to be worth throwing the last three factors into “a regression”.  That really doesn’t make this analysis all that much better, but it does account for the correlations among those factors.  (E.g., that the high-performing states for vaccination tend to be cold-climate states.)  In theory, this gives you more of the “independent effect” of each factor separately.

It’s more-or-less impossible to interpret the magnitudes of the coefficients, the way I’ve slapped this together.  Really, the only thing you can conclude is that even if you look at all three factors at once, you still get more-or-less the same answer.  Higher vaccination rates had no statistically significant association with growth in new COVID-19 cases over the period.  (The p-value tells you there’s a 50-50 chance that the true effect is zero or negative.)  Otherwise, yeah, there’s a strong negative association between March COVID-19 case growth, and both the fraction of the population that has already tested positive for COVID-19, and the typical climate (with warm-climate states doing better than cold-climate ones).

Arguably, the reason for such a numerically large effect on the cumulative cases per capita is that I used the fraction who tested positive.  For every positive cases, there’s likely to be four or five others who were never tested.  So that’s a small number, standing in for a big one, and as a result, the regression give it a numerically big coefficient.

Summary:  Really, the only thing I can conclude from this is that the picture really is as muddled as it seems to be.  At least for telling this story as a race between vaccination and more infectious variants.

If the U.K. variant or the California variant really are that much more infectious, it sure doesn’t show up to the naked eye.  Or in a simple, more-formal statistical analysis.  The vaccination effect shown here is almost surely the result of behavior (which states chose to vaccinate quickly), rather than any actual medical effect of vaccinations on infection rates.  (And in any case, it’s not statistically significant when placed alongside the other predictors.)  Right now, on average, it is the cold-climate states that have the positive new-case trends.  And, again on average, the states with the highest past infection rates did best at limiting increases in new cases in March.

Make of all that what you will.  I don’t think it provides a lot of useful information.  All it really does is tell me that there’s nothing that suggests that the true, strong driver of current trends is the race between vaccination and new COVID-19 variants.