Post #1144: COVID-19 seroprevalence surveys and re-thinking my estimate of herd immunity

Posted on May 13, 2021

Source:  Calculated from CDC state-level COVID-19 seroprevalence data from early March 2021, and counts of persons who tested positive for COVID-19, at various times, from the New York Times COVID-19 data repository.  Both sources accessed 5/12/2021.

I’ve been re-examining the CDC’s most recent seroprevalence survey results.  This post is the result of that.

A seroprevalence survey looks for antibodies to COVID-19 in blood samples.  It gives you an estimate of the number of people who were ever infected with COVID-19 at some point in the past.  Combing that with past and current counts of persons with a positive COVID-19 test, you can estimate the current number of people who should be immune to COVID-19 due to prior infection.  That number, along with the number of vaccinated individuals, feeds into any estimate of how close the population is to achieving “herd immunity”.

Based on the most recent CDC seroprevalence survey, the bad news is that we’re probably not as close to herd immunity as I thought we were. I was basing my herd immunity estimate on older data — the tall bars at the left side of the graph above.  In the past, there appeared to be more than four total infections for every positive test.  But now, that ratio is much closer to two infections for every positive test.  This means that I overstated the true fraction of the population that is immune to COVID-19 due to a prior infection.

The good news is, protective antibodies from prior COVID-19 infection appear to last a long time.  The simple statistical analysis above shows that people who were infected with COVID-19 more than nine months ago still appear to have enough circulating antibodies to be picked up as positives in a seroprevalence study.

Details follow.

Background:  Seroprevalence studies of COVID-19.

Three types of tests.  First, if you don’t know about the three different types of COVID-19 tests, this reference is one of many good plain-English explanations.   Briefly:

  • A DNA or PCR test looks for a current infection by checking for fragments of the DNA of the COVID-19 virus.  This is the original COVID-19 test.
  • An antigen or “rapid” test looks for a current infection by checking for proteins found on the surface of the virus.  This test came along some months after the pandemic hit.
  • A blood or serology test looks for any past infection by checking for any of several types of antibodies to COVID-19 in the blood.

When you see counts of COVID-19 cases, in most states, you are looking at counts of persons who tested positive with either a PCR test or an antigen test.  At the start of the pandemic, only the PCR tests were counted.  But as the antigen tests became more common, almost all states switched to counting either a PCR test or an antigen test as an actively infected individual.

See Post #1016 for a full write up of this change in case count reporting, and the large fraction of “probable” cases that now enter the count because they have a positive antigen test.


Significant under-count of actual infections from PCR and antigen tests.  It is well known that the count of positive COVID-19 tests (PCR plus antigen) grossly understates the true number of COVID-19 infections.  This was evident from the first analysis of the original Wuhan epidemic, where a large number of cases never received a formal positive diagnosis.  The same has held true for virtually every location where that issue has been examined.    For example, here’s a CDC estimate of a roughly fifteen-fold under-count of true COVID-19 cases in the pediatric population of Mississippi.

The under-counting occurs for several reasons.

  • Some people have no symptoms and so have no reason to get tested.  Dr. Fauci estimated that asymptomatic cases account for about 40% of all U.S. cases).  This will be even more true of the pediatric (under-age-18) population, where asymptomatic cases are more common than they are among adults.)
  • Others have symptoms but, for some reason, did not get tested.  That might have been due to a shortage of tests early in the pandemic, or to unwillingness to be tested.  There is no good estimate of that factor.
  • Yet a third factor leading to understatement of the true number of infections is the high false-negative rate of the DNA (PCR) COVID test (Post #859).  A single PCR test has somewhere around a 25% chance of missing a true COVID-19 infection.
  • And a fourth factor is the even higher false-negative rate of antigen tests.  These cheaper, faster tests look for specific proteins on the surface of the COVID-19, and the chance that a single antigen test will miss an active COVID-19 infection is around 50%.

Blood tests for antibodies are not perfect. I went through the ins and outs of blood testing for COVID-19 antibodies in Post #940.  There are several ways in which blood tests — and in particular the CDC’s seroprevalence surveys — can understate the true count of past COVID-19 infections.  That is, these tests also have a non-neglible false-negative rate.

  • Persons with asymptomatic or mild cases may have too few circulating antibodies to trigger a positive on the blood antibody test.
  • In particular, the most commonly-used antibody tests are not sensitive to those mild cases (per this review of the evidence on seroprevalence testing for COVID-19).
  • Antibody levels may decline over time, so that persons with infections early in the pandemic may still retain some immunity, but fail to trigger a positive on a blood test.
  • More recently, there is the potential to confound the results of immunization with the results of infection, but this is a matter of selecting which antibody types to test for.  (This is evident from the CDC web page cited above).

Even in populations where not much time has elapsed since infection, high-quality antibody blood testing still has a significant false negative rate.  In this study of Spanish health-care workers, around 10% of persons known to have been infected with COVID-19 were missed by a comprehensive set of blood antibody tests.

Finally, the sample of persons used for the actual CDC prevalence surveys is a sample of convenience.  They just re-use blood drawn for routine testing, such as (e.g.) cholesterol tests or screening “panels” (multichannel automated tests).   And the CDC only relies on two national laboratory companies.  The extent to which the CDC’s sample differs from the U.S. population as a whole is unknown.

The evolving relationship between the count of positive COVID-19 tests and estimates of seroprevalence.

I looked at the available CDC COVID-19 seroprevalence survey data back in early January (Post #933).  At that time, they had two rounds of survey data.  In the earliest round (circa August 1, 2020), by my calculation they found about five total infections for every one that had been reported via a PCR COVID-19 test.  In their second round (circa September 15, 2020), by my calculation, they found about four total infections for every one that had been reported via PRC test.

I didn’t make much of it at the time, because the methods clearly were pretty crude.  For example, their method didn’t seem to give reliable methods at the state level.  Two points does not make a trend under those circumstances.

I then looked at what evidence was available, including the likely fraction of cases missed by seroprevalence tests, the correlation of the seroprevalence and test count data across states, and so on.  I decided at that time to use a nice, round, five-to-one ratio for estimating total COVID-19 infections.  That is, for every positive COVID-19 test, I would assume that four more individuals had been infected but did not receive a positive COVID-19 test.  That’s all laid out in Post #940.

As you can see from the graph I did back in January, the linear trend line (in a cross-section of states) from the (then) last round of data shows about 3.7 total infections for every diagnosed infection.  That was just slightly lower than the ratio I got by simply adding up all the state-level seroprevalence estimates, weighting by state population.

Four things have changed since then.

First, the CDC has expanded that seroprevalence survey, and produces new estimates by state, every two weeks.  The most recent set of estimates date to early March.

Second, the mix of tests has changes.   In general, tests have become much more readily available, and, with “spit” tests, less intrusive.  (No more swab-up-the-nose.)  Currently, about a third of positive COVID-19 tests reported in the country are the antigen “rapid tests”, not the PCR tests.

This would argue for seeing a true lower ratio of total infections to total positive tests.   The easier it is to obtain a test, the more likely symptomatic individuals are to get tested.

Third, the age mix of the pandemic has shifted toward much younger people.  In some states, such as Michigan, the highest-incidence age group is high school students.  But this population has more asymptomatic infections.

It’s not clear what this would do to the ratio of total infections to total positive tests.  Asymptomatic individuals tend not to get tested, but they also tend to be missed by the blood antibody tests.  In effect, some portion of true infections will be skipped in both the numerator and denominator of the ratio of total infections to total positive PCR/antigen tests.

Four, a lot of time has passed.  As immunity or antibody levels fade over time, presumably the ratio of total persons found via blood antibody test, to persons ever found via PCR/antibody test, will fall.

That said, there’s no harm in re-doing the graph above, using the most recent CDC seroprevalence data, and counts of known infections based on PCR/antigen tests.

But now, the correlation across states suggests just 2.2 total COVID-19 infections for every person diagnosed with PCR/antigen testing.  As before, the direction calculation from the data, weighting the state estimates by state population, yields a slightly higher estimate.  In this case, the direct calculation yields about 2.3 total infections for every positive PCR/antigen test.

The first question to answer is whether or not this is due to immunity or antibody levels fading over time.   To answer that, I did a little “regression analysis”.  I broke up the population with positive PCR/antigen test into cohorts depending on how long ago they tested positive.  (More than 9 months ago, 6 to 9 months ago, 3 to 6 months ago, under 3 months ago, all starting from March 12, 2021, which is the mid-point of the latest seroprevalence test data).  Then I tried to predict the state-level seroprevalence based on the fraction of each state’s population falling into those cohorts.

With just 50 states, and a not a lot of variance to latch onto, the regression estimates are subject to considerable uncertainty.  That said, the regression coefficients for those cohorts (graphed below) pretty clearly show that the issue is mostly NOT the fading of immunity over time.  It’s that there has been a dramatic change in the ratio of total infections to total tests, as shown below.

If I go back half a year — which is when I did my last analysis — sure enough, the regression results say that for that cohort, were just over four total infections for every infection reported via PRC/antigen testing.  But as I get closer to the present (or, in this case, to March 12, 2021), that falls dramatically.  Just under 2, for the period 3 to 6 months ago, and just under 1.5 for the most recent three months prior to March 12, 2021 (the date of the seroprevalence survey).

I’m still not quite sure how much of this change is real, and how much might be an artifact of the change in the age mix of new cases.  That said, this period corresponds with the growth of the much-easier spit tests and antigen “rapid” tests.  In addition, I don’t think there is any shortage of tests any more, as there was in the early part of the pandemic in the U.S.  Given all that, the preponderance of evidence suggests that the ratio of total infections to diagnosed infections is much lower now than it was in the second half of 2020.

Implications for my herd immunity estimate.

I’ve already stated that something had to be wrong with my estimates of herd immunity or with the herd immunity model itself.  When I ran those estimates of COVID-19-immune population by state, it appeared that some states should have already passed the herd immunity threshold by a wide margin.  And some of those states would be close to any reasonable estimate of herd immunity even if there were no infections not counted via PCR/antigen test.

From Post #1131:

Here’s the problem.  Take North Dakota.  Well over half the population of that state has been immunized.  That’s a fact, and comes right out of CDC data.  More than 14% of the population has been diagnosed with a COVID-19 infection.  Again, right out of CDC data.  When I combine those two, under the assumptions of a) five total cases for every diagnosed case, and b) random overlap of the infected and vaccinated populations, then it looks like almost everyone in North Dakota ought to be immune to COVID-19 now.  Based on all those assumptions, you’d expect that 96% of the population would be immune.

But it’s even worse.  If I make the ludicrous assumption that there were never any undiagnosed COVID-19 cases, you’d still estimate that 66% of the population there was immune to COVID-19 (not shown).  Even with that gross understatement of immunity via prior infection, they’d still be right on the margin of the number that is thought to be required for herd immunity.  Toss in even a tiny bit of protection from COVID-19 hygiene (mask use, distancing, and so on) and they should still be over anybody’s plausible estimate of the requirement for herd immunity.

And yet, they don’t appear to have achieved herd immunity.  Or, at least, not with any obviousness or clarity.  Nor have any of the other states at the top of the list.  They all seem to have a low and (for the most part) stable rate of daily new COVID-19 cases.

At this point, it’s pretty clear that the five-to-one ratio that I have been using is obsolete.  It failed to keep up with reality.  Reality in this case being that tests are much easier to obtain and take now than they were in the second half of 2020.  A more reasonable round number, given the potential for blood tests to miss some cases, might be three-to-one.

In round numbers, correcting that assumption knocks 11 percentage points off the estimate of the population that is already immune to COVID-19. 

That said, even with this more conservative (and almost certainly more accurate) estimate of total infections, there are plenty of states that should exceed any reasonable estimate of the herd immunity level for the currently-circulating strains of COVID-19.  Here’s the 5/6/2021 state table, redone with this new assumption.  As you can see on the right hand column, every state on this list should have 80% of more of the population immune to COVID-19.

Returning to the just-previous post, I see Rhode Island, New Jersey, and Delaware on this list.  They were among the cluster of states with a more-than-30-percent decline in new COVID-19 cases in the last seven days.

I realize at this point that I’m grasping at straws, but maybe some states are reaching the herd immunity level right now.  My contention has been that if the end of the pandemic is driven by immunization, you won’t see the class slow tapering off of new cases.  Instead, the pandemic should end with a rapid dropoff in new cases.  (Last discussed in Post #1127).

Based on the arithmetic, a pandemic that is shut down by vaccination should end with a bang, not with a whimper.   You ought to see an accelerating rate of decline in new cases, right up to the point where there are no more cases.  That’s what I’ve been waiting to see.  Maybe we’re finally seeing it.