Post #940: Seroprevalence surveys

Posted on January 6, 2021

Source:  Calculated from supplemental data, round 4, Bajema KL, Wiegand RE, Cuffe K, et al. Estimated SARS-CoV-2 Seroprevalence in the US as of September 2020. JAMA Intern Med. Published online November 24, 2020. doi:10.1001/jamainternmed.2020.7976

I’m still trying to nail down the actual fraction of the US population that currently has antibodies to COVID-19.  Bottom line is that the best you can do is take a conservative, educated guess.  I’ll be assuming a 5-to-1 ratio of total cases to diagnosed cases.  The rationale for that follows.

Recall (from Post #933) that CDC staff have two different approaches that could be used to estimate the total number of COVID-19 infections in the U.S.   One starts with the count of diagnosed cases, then inflates that for for under-detection of COVID (e.g., people who didn’t bother to get tested).  That yields an estimate of somewhere around eight true cases for every diagnosed case, as of September 2020.  Alternatively, you can just measure the fraction of blood samples that contain COVID-19 antibodies.  That approach yields more like four true cases for every diagnosed case.

That’s a material difference.

Neither approach is what I would call a gold standard.  The first (8-to-1) approach depends on a lot of self-reported internet-collected data for estimates of (e.g.) the likelihood that an individual with symptoms will actually go see a doctor.  And it makes assumptions that (e.g.) the people who had symptoms and didn’t see the doctor were as likely to have COVID-19 as those who did see the doctor.  A lot of those assumptions are more-or-less not separately testable.  They are what they are, and that’s the answer they got.

The second (4-to-1) method depends on blood samples.  That seems to be a lot more concrete.  And yet, the blood is a sample of convenience, typically collected for some sort of routine test (e.g., cholesterol test).

I started to look at the blood testing more closely.  As shown above, it does correlated reasonably well with the diagnosed prevalence of disease by state.  States with higher known case loads are, on average, states with a higher fraction of persons with blood antibodies for COVID-19.

But in my initial look at this, I did not understand the extent to which these blood tests may miss a significant fraction of the population that has been infected.

Based on this review of the evidence on seroprevalence testing for COVID-19. it turns out that commonly-used blood antibody tests might miss many or most individuals who had mild cases.  Further, that would depend on exactly which antibodies were tested, with the most common high-volume tests using some of the least sensitive antibodies (IgG) and not the most sensitive ones (IgA).  (And, upon inspection, the tests used in the CDC study did not test for the immunoglobulin (antibody) IgA that these researchers thought was most critical to identifying mild cases. )

Using the most sensitive testing methods, most individuals who were infected with COVID will still have detectable anti-COVID antibodies six months later, and probably have immunity to COVID for several years (reference). By contrast, this small study found that many asymptomatic cases had undetected levels of antibodies after just a few weeks, relying on tests for IgG antibodies alone.

Separately, but less relevant, antibody levels take time to build, and so blood tests will miss a large fraction of individuals who were infected a week or two prior to the test.

The upshot is that seroprevalence studies, as typically undertaken with high-throughput off-the-shelf tests, will typically miss some portion of the population who actually had been infected with COVID-19.  They will have been infected, but for some reason will have antibodies that are below the threshold to trigger the test.  That reason may be:

  • Only recently infected, and antibody levels have not built up.
  • Asymptomatic infection with low antibody response.
  • Asymptomatic or mild infection some time in the past, with low residual antibodies.
  • And all of that will depend on which specific antibodies are being measured, with the most sensitive test (IgA) being the least likely to be performed in a large-throughput setting (such as a commercial lab).

It’s not clear whether those mild-to-asymptomatic cases actually retain immunity to COVID-19, once certain antibodies have reached undetectable levels.  So far, documented cases of presumed re-infection remain rare.  And none of these tests is specifically for “neutralizing antibodies” (the substances that actually protect cells against pathogens).  So it’s not clear that those mild cases should or should not be counted as part of “the herd” that will eventually yield herd immunity.  At some point, the guess is that they will no longer be immune and so could spread COVID-19.  But in the short run — where we are now, say —  it’s probably a mistake not to count them.

It’s tough even to guess how big an issue this is.  About 85% of COVID-19 cases are “mild” under the clinical definition.  That is, they don’t require hospitalization.  Estimates of the fraction of cases that are asymptomatic range from almost zero to almost 40 percent.

Just to show a plausible adjustment, if asymptomatic cases are 20% of the total and blood tests miss half of those, and mild cases are a further 65% and blood tests miss a quarter of those, then adjusting the 4-to-1 seroprevalence estimate for those false negatives would yield 5.4-to-1.  To be compared to the CDC’s other methodology, which (to one decimal place) yielded 7.7-to-1.

Is there any other evidence?  Near as I can tell, no.  The CDC has a couple of other seroprevalence surveys in the works (including one using blood donations) but no results yet.  And the NIH has a central repository for U.S. seroprevalence studies (SeroHub), although they seem to be doing no type of summary or meta-analysis of the results.  Otherwise, it’s totally scattershot.

(The only exception to this that I found is a study of blood from end stage renal disease (ESRD) dialysis patients.  That is presumably national in scope.  But that study did not have information on the actual fraction of those individuals who tested positive for COVID using PCR. And so, they compared seroprevalence in the ESRD population to diagnosed cases in the U.S. population.  There’s no way that I would use the ESRD population as a proxy for the U.S. population in general.  Most dialysis patients need to be in a medical facility, in close proximity to others, several hours a day, three days a week.)

So, let’s face it.  Nobody can pin this number down very well.  And at this point, there are no good state-level estimates.  This is not a statistical issue — not due to lack of sample size.  These are “structural” problems — biases and uncertainties that have to do with how you get your numbers in the first place, not how many numbers you get.

Going forward, I think I’ll assume a nice, round 5-to-1 ratio of total COVID-19 cases to diagnosed cases.  I think that a single digit of accuracy is all that number deserves, all things considered.  This is modestly higher than the most recent CDC staff seroprevalence estimate, and substantially lower than the CDC staff “constructed” estimate (inflating diagnosed cases for presumed under-detection).

With that assumption, here’s how the states look as of data reported through 1/5/2020.  Even with this now-reduced multiplier, if the original estimate that 70% was required for herd immunity was correct, then the Dakotas should be pretty close to that.

Once again, I’m not advocating herd immunity via infections as healthcare policy.  I’m just trying to figure out whether or not it’s actually happening, regardless of policy.