Post #1131: Herd immunity: Why aren’t we there yet?

Posted on May 2, 2021

Warning:  This is a long and somewhat technical post.  There aren’t really any results to speak of.  If you don’t have a strong interest in the topic of herd immunity, there’s nothing much here for you.

With that out of the way, the short answer is that we really should be getting close to herd immunity now.  But there’s no sign of it, and it’s sure starting to look overdue at this point.  I provide some state-level estimates showing that, below.

Other than saying “we’ll get there when we get there”, can I point to anything that might plausibly explain why were NOT seeing herd immunity yet?

I don’t think it’s the data. The vaccine counts are tough to argue with.  And while we can dither over exactly how many people have had COVID-19 (versus the number formally diagnosed), I don’t think that’s the hangup, either.

At this point, my guess (and it is just a guess) is that the problem is the far-too-simple model that epidemiologists use to estimate what is required for herd immunity.  I can’t really say what’s wrong with it.  But I can say that if it’s correct, and the first estimates of the infectiousness of COVID-19 (the “R-nought”) were ballpark, then it’s getting to the point where there’s really no way to explain why we’re not seeing herd immunity yet, within that standard, simple model.

My best guess?  Non-homogeneity of the population.  The standard model for herd immunity relies on an assumption of homogeneity.  In effect, it assumes that immune individuals, still-at-risk individuals, and their interactions, are all randomized.  That’s the case where, on average, many immune individuals stand between the still-vulnerable individuals and the infected individuals, stopping spread of disease.  But if that’s not true — if the natural breaks within our society result in clustering the infected and the non-immune together — it seems to me that a pandemic can keep going well past the point where the averages suggest we should have reached herd immunity.

Review of the herd immunity calculation, and the basic question:  Why have we seen no clear sign of herd immunity yet?

Let me quickly review the basics.  When COVID-19 first came to the U.S., the best estimate of the base or initial viral replication number (R-nought) was 2.5.  That is, if no precautions were taken, and nobody was immune, each infected individual was estimated to infect an average of 2.5 others.

Obviously, that’s a fairly squishy number.  It’s going to vary from place to place.  For example, densely-populated urban areas will likely have a higher R-nought than remote rural areas.  Group living quarters (prisons, dormitories) will likely have a higher R-nought.  And so on.  It depends on both biology and behavior.  And the actual value is difficult to infer from the data, and can be based either on back-solving from the number of observed cases over time, or generated from often-incomplete contact tracing data.

But it is what it is.  Then along comes the U.K. variant, which is estimated to be 40% more infectious.  Again, another squishy number, for at least the same reasons as the original R-nought was.   That would imply an R-nought of about 3.5 for the U.K. variant.

In the simple model of herd immunity, you get to herd immunity when you have enough immune individuals to bring the observed viral replication factor below 1.0.  The “observed viral replication” is what you get once you start taking precautions, and once you start to get a fraction of the population immune via prior infection or vaccination.  The value of 1.0 is critical because if each infected person goes on to infect an average of less (fewer) than one new person, the pandemic shrinks away.

In that simple model of a pandemic, the immune fraction required to achieve herd immunity is just (one minus) the inverse of R-nought.  E.g, If a virus has an R-nought of 3.0, you’d need to block two out of three of those chains of infection to bring the replication factor down to 1.0.

And so, working from a worst-case scenario in which all new U.S. cases are the U.K. variant, you would need to have (1 – (1/3.5) = 71% of the population immune to COVID-19.  (Check the math:  29% * 3.5 = 1, so if less than 29 percent of potential infections are successful, the simple model says the pandemic has to shrink away.)

TO BE COMPLETELY CLEAR, you need 71% if you take no other actions to prevent infection.  To the extent that you take other actions to prevent infection, you need less than 71%.  This is widely recognized, but rarely discussed, by professional epidemiologists.  And so, with good COVID-19 hygiene, you should, in theory, need significantly less.

How much less, I last estimated in Post #981 That calculation uses a higher R-nought value than the one mentioned above. But I come up with a lot less.  Like 50%-ish, instead of 70%-ish.  And so, that seemingly straightforward calculation in that post MUST be wrong.  Because we have seen no sign that any state has well-and-truly passed the point of herd immunity.  And many are, by almost any plausible estimate, well over that threshold.

So, what’s wrong?  What’s wrong with that calculation.  Or with the assumptions underneath it.  Or the estimate of R-nought.  Or the estimate of the fraction of the population immune via prior infection.  Or the basic simple epidemiological model of herd immunity.

Or have we passed the point of herd immunity, but we’re only going to realize that in hindsight?

Spoiler:  I’m not going to be able to answer any of these questions.  I’m just going to set up a new way of trawling through the data.

The only thing I’m fairly sure of is that some places, somewhere in the U.S., surely ought to have passed the herd immunity threshold by now.   For the U.S. as a whole, here’s my best guess as to how it looks, as of 5/1/2021.


First, a little sensitivity analysis

Here is the the same national table I’ve presented before, giving my best guess for herd immunity.  (I’ve fixed one small error, resulting in a corrected calculation that’s a couple of percentage points lower than it was in the past.)  That said, best guess, around two-third of the U.S. population ought to be immune to COVID-19 now.

This table embodies several assumptions.  From most to least important, they are that:

  • There are five total COVID-19 cases for every one that is diagnosed.
  • People who have recovered from a COVID-19 infection get vaccinated at the same rate as people who have never been infected.  (That’s the reason for netting out the “overlap” terms above).
  • Immunity is assumed to be:
    • 100% for people who have recovered from infection.
    • 54% for people those who are partially vaccinated (just one shot of a two-shot vaccine.)
    • 90% for those fully vaccinated.

Sensitivity analysis:  All of those could be questioned, but I don’t think any one of those is grossly in error.  Now let me do a little sensitivity analysis, to see how this estimate changes if I modify those assumptions.

In the table above, I’ve tried to substitute plausible alternative values for my baseline assumptions.

In the worst case shown above, I assume there are only four total infections for every diagnosed infection, and those people are only 80% immune to COVID-19 now.  The four-infections number is actually less than has been directly shown with seroprevalence (blood-antibody) surveys.  The 80% figure is based on the assumption that immunity fades after six months.  Under those assumptions, I still find that more than half the population ought to be immune to COVID-19 right now. 

And, based on my prior calculation for the impact of COVID-19 hygiene, Post #981, even that small a fraction, with maintenance of COVID-19 hygiene, ought to push us past herd immunity, even with a virulent U.K. strain (R-nough > 4) being the sole strain currently circulating in the U.S.

And so I come back to the same question.  If the whole story about COVID-19 is roughly correct — the assumptions above, the simple epidemiological model of herd immunity and so on — why is there no clear sign that even one U.S. state has managed to achieve herd immunity?

Refresh the state data

Let me update and modernize my table showing approximate COVID-19 immunity levels by state.  This uses the exact same assumptions (literally the same calculation) as the national table above.

This table answers the question “What do the states now look like, in terms of the fraction of the population that has already had COVID-19, and the fraction of the population that has been vaccinated?”.  And then provides an overall summary  that adds those and crudely accounts for overlap.

Below is an image of the US and the top 10 states.  Assume for the moment that I haven’t made an outright calculation error.

The two groups of columns are counts of persons, and then the same data expressed as percent of population.  For quick orientation, the 67% in the upper-right corner of the state table below is the same 67% that’s in yellow in the national table above.

The first two columns are the count of persons diagnosed with COVID-19, and then 5x that (my best estimate of the true count of infected individuals).  This is followed by counts of those partially and fully vaccinated.  Then there’s a simple sum, not accounting for the overlap of those vaccinated and infected populations.  Then a final, de-duplicated total, assuming that these overlap is a straightforward way — that those who were infected are just as likely to get vaccinated as those who were not.

Source:  Calculated from US CDC data, state tables of COVID-19 prevalence and vaccination, downloaded 5/2/2021 from the CDC COVID data tracker.  Population estimates are US Census 2019 estimates.

Here’s the problem.  Take North Dakota.  Well over half the population of that state has been immunized.  That’s a fact, and comes right out of CDC data.  More than 14% of the population has been diagnosed with a COVID-19 infection.  Again, right out of CDC data.  When I combine those two, under the assumptions of a) five total cases for every diagnosed case, and b) random overlap of the infected and vaccinated populations, then it looks like almost everyone in North Dakota ought to be immune to COVID-19 now.  Based on all those assumptions, you’d expect that 96% of the population would be immune.

But it’s even worse.  If I make the ludicrous assumption that there were never any undiagnosed COVID-19 cases, you’d still estimate that 66% of the population there was immune to COVID-19 (not shown).  Even with that gross understatement of immunity via prior infection, they’d still be right on the margin of the number that is thought to be required for herd immunity.  Toss in even a tiny bit of protection from COVID-19 hygiene (mask use, distancing, and so on) and they should still be over anybody’s plausible estimate of the requirement for herd immunity.

And yet, they don’t appear to have achieved herd immunity.  Or, at least, not with any obviousness or clarity.  Nor have any of the other states at the top of the list.  They all seem to have a low and (for the most part) stable rate of daily new COVID-19 cases.

Let me put it this way.  At the minimum, this table shows combinations of total vaccination rate and known, diagnosed individuals that that do NOT appear sufficient to guarantee herd immunity.  So you can think of the situation in North Dakota as raising the bar for all the other states.  More than half immunized, and (take your guess as to) a very large chunk that have recovered from COVID-19.

Sure, they have poor COVID-19 hygiene in that part of the country:

Source:  Carnegie-Mellon University Covidcast, accessed 5-2-2021.

But still, despite all that vaccination, and all that prior illness, the pandemic persists there.  Any fading-away of the case counts has yet to start.

Is the problem the homogeneity assumption of the simple pandemic model?

Consider the schoolchildren of North Dakota.  They can’t be immunized, and so are not part of the overall immunization rate.  I don’t think a lot of them have caught COVID-19, so they probably aren’t immune that way.  And yet they mingle, a lot, with one another, at school, at activities, at play.

I wonder, just as a hypothetical, whether you could literally have 100% of North Dakota adults immune to COVID-19, and still have the pandemic persist within the school-age population.  More assertively, it sure seems as if that could happen, even though North Dakota, on average, will have hit herd immunity, based on the simple models.

And the key assumption that this violates is homogeneity.

In the standard herd immunity model, infected individuals, immune individuals, and all of their interactions are distributed at random.  In short, for standard herd immunity, the calves are distributed randomly in the herd, like raisins in raisin bread.

In that standard model, there are no gatherings of schoolchildren.  Like calves in the herd, schoolchildren would be uniformly distributed in the entire population.  To exaggerate, you’d be as likely to find them working in a stockbroker’s office as you would be to find them on the playground.

If we then think of that standard herd immunity model, it just doesn’t apply.  Or, at least, if we just calculate population-level averages, we’ll wildly mis-predict the end of the pandemic.

With standard herd immunity, at any given time, vulnerable individuals obtain (statistical) protection from infection because many immune individuals stand between each infected person and each still-vulnerable person.

But that simply does not apply, in this hypothetical.  When schoolchildren are gathered together, more-or-less nothing stands between the infected and the vulnerable schoolchild.  All the adults — those who by assumption are immune at this point — are off elsewhere.  The assumption of homogeneity — that those infected adults would stand between the infected and at-risk children — is simply wrong.

Here’s the problem.  From what I’ve seen, mathematical models that account for this sort of “stratification” with in the population suggest that it should result in an even lower threshold for herd immunity to kick in.  Which, given the discussion above, I just completely fail to grasp.

That said, I think part of the issue here is the lack of homogeneity of the population.  I bet the pandemic can persist even if a large average fraction of the population is immune.  Just as long as, in some places, or within some sub-populations, those remaining infected and vulnerable persons end up being close to one another.  As in the schoolchildren example above.

Not much I can do that that, at this point, except to point to it and say, maybe that’s why we’re not at the herd immunity level yet.

Drill down to smaller units, maybe?

I’d like to contrast this finding for these large geographic areas to what’s happening on the campus of the College of William and Mary.  I track them (e.g., Post #1121), because my daughter is there.

After an outbreak at St. Patrick’s day, a very large fraction of the student body has been immunized.  There, the new case counts slowed fairly abruptly, then stopped entirely.  In the past week, they’ve found zero new cases of COVID-19.  That’s not unprecedented (it happened once before, just before Thanksgiving 2020).  But it’s the only example I can lay my hands on, of a population that has plausibly achieved herd immunity via vaccination. 

To me, the interesting thing is that this population fits the classic, simple herd immunity model: It’s homogeneous, well-mixed, and largely self-contained.  Students are all roughly the same age, there is a lot of interaction among and across subsets of students, and there’s relatively limited interaction between the student body and the population of Williamsburg, VA.

This suggests that “drilling down” below the state level might yield some plausible examples of smaller communities that appear to have reached herd immunity.

Unfortunately, I can’t seem to find any dataset showing vaccination estimates by county.  Clearly, those exist (CDC publishes a map showing that), I just haven’t been able to find data file showing all 3000+ U.S. counties and county equivalents.

The only thing I can show is the current new-case rate.  In the simplest case, I can search for counties that have had zero new cases for the past week, past two weeks, and so on.  And see if there are any places that persistently show no new COVID-19 cases.  And see if they have anything in common (other than a small population, which is almost a given, in that I’m looking for literally zero cases.)

Having done that exercise, I can say that this is complicated by data reporting issues from the states.  Clearly, in many instances, counties probably do have zero new cases, but the states keep dumping groups of old cases into the data, probably as a result of identifying count-of-residence based on ZIP code or similar.  (Five-digit ZIP codes frequently cross county boundaries).

But there’s no harm in trying this, as a simple exercise.  Let me restrict it to counties with at least 5000 population.  And in that case, I find five counties that have had zero reported COVID-19 cases for the past four weeks, as of 4/28/2021.

I’m not quite sure how “real” this is.  Turns out, there were just over 200 counties with no cases last week.  But of that list of 200, just five also had no cases in the prior three weeks, and had 5000 or more population.  So there is some real chance that these were inadvertently “cherry picked” by this set of screens.  (I.e., if you started from a list of 200 counties, you might be able to find five just by chance).

But that’s easy enough to test.  Wait another week and see how many of these survive.  If they all drop, we’ll know this was an artifact of how this list was created.  If they survive, it’ll be and indication that maybe, somewhere in America, there really are some COVID-19-free pockets now.