~~The data source I use accidentally added 50000 cases to one state today.~~ After correcting that, there’s no change in the U.S. COVID-19 picture compared to yesterday.

Edit: Nope. Turns out that Missouri dumped about 50,000 old cases into the file as of yesterday. (There is literally no mention of this on the Missouri state COVID-19 dashboard. The only way to know this is to check a single nondescript state-level footnote on the NY Times Github COVID-19 data repository.)

This seems to be a catch-up for reporting of persons who tested positive via antigen testing. I think that leaves just four states now that do not report those positive antigen tests as positives (Post #1017). This also means that the Missouri number, going forward, should be about one-third higher than it has been historically.

Source: Calculated from NY Times Github COVID-19 data repository, data through 3/8/2021.

**How will we know we’re approaching herd immunity?**

This is a little bit of half-baked statistical analysis, with a seemingly important point. The question is, how will we know that we’re approaching herd immunity? The short answer is that I don’t know. As of today, no simple state-level analysis even comes close to demonstrating the “right” relationship between population immunity and the current rate of new infections.

**First, let me remind you of the facts. This topic is all about the rate at which the average infected person spreads disease**. The more infectious a disease is — the more persons that the average infected person spreads the disease to — the more people need to be immune before the pandemic will stop.

For example, if the average infected person goes on to infect two others, you’d need to have half the population immune to the disease before the pandemic will begin to die down. If the average infected person infects three others, then two-thirds of the population would need to be immune. And so on.

Anything that affects that figure — the effective “R” — affects the level of immunity that is required to bring the pandemic to a close. If a more infectious variant of COVID-19 takes over, then you need a higher level of population immunity to suppress it. If COVID-19 hygiene improves (say, everybody uses an N95 mask), then you need a lower level of population immunity to suppress the pandemic.

People often forget about that last one, because they implicitly are thinking about the level of herd immunity required for a “return to normalcy”. The are thinking of keeping a disease suppressed by immunity alone, without (e.g.) masks and distancing. But the fact is, the level of population immunity required to suppress a disease in conjunction with masks and distancing is lower than the level required if you rely on immunity alone. This is just one of the reasons that removing mask mandates at this point is self-defeating. All it does is prolong the agony and raise the bar for actually getting the disease fully under control.

In an urbanized area with many person-to-person contacts daily, you likely need a higher level of immunity in the population to suppress a given disease. By contrast, in a sparely-inhabited rural area with fewer daily contacts, you’d need a lower fraction of the population immune in order to suppress a disease.

So there’s a lot of variation.

But abstracting from that, the main point is that once you have a large enough fraction of the population immune to COVID-19, you ought to see the rate of daily new cases dwindle. ** If we have reached that end-game, and are at herd immunity, you ought to see a negative correlation between fraction of the population immune, and new cases per day. **

**Here’s the problem. If order to get to a high fraction of the population immune, practically speaking, you must have had a high rate of new infections some time in the past. ** That’s 100% true prior to widespread immunization. That is nearly 100% true even with immunization, because almost every is immunizing at almost exactly the same rate. So the only significant variation in the fraction of the population immune to COVID-19 comes from the historical rate of infection in that population. A**nd so, as different geographic areas are still traveling the path toward herd immunity, we’ll see a positive correlation between between fraction of the population immune, and new cases per day. **Because literally the only way to get to a higher-than-average level of immunity in the population was to have had a higher-than-average rate of infections at some point.

**And so, crudely, as areas start hitting that herd immunity threshold, whatever it may be, I’d expect to see the correlation between fraction of the population immune and new cases per day flip around**. As regions move toward herd immunity, you’ll see a positive correlation — those with more infections are getting to a higher rate of overall population immunity. And once you are there, and the sheer mass of immune individuals begins to shut down the pandemic, then you ought to start seeing a negative correlation. Areas with a history of high infection rates — and so a large fraction of the population immune to COVID-19 — ought to be areas with low current infection rates.

There are a lot of complications beneath this. Among other things, I can’t get at (e.g.) vaccination rates at the sub-state (county) level. Near as I can tell, there is no national data source for that. And it’s likely that the ratio of true (total) cases to diagnosed cases varies considerably across states. As does COVID-19 hygiene. And of course, the fact that we’ve had some huge seasonal trends does not make this any clearer. And so on.

But that’s what statistical analysis is for. With any luck, those other factors will be a wash. And if there’s any systematic relationship between population immunity and current daily new cases, that should be apparent if you average enough individual units.

**So let me start with states as the unit of analysis. Right now, what’s the relationship between current rates of new infections (cases / 100,000 / day, seven-day moving average) and the fraction of the state population that is plausibly immune to COVID-19?**

**Answer: Positive. (Although “no relationship” would also be an acceptable answer here.)**

I constructed the estimated fraction here as a combination of five times the diagnosed COVID-19 cases plus a weighted average of those fully and partially vaccinated. It make no difference whether I used five times or three times. **The upshot is that, at the state level, we are not yet seeing any negative correlation between the fraction of the population immune, and the current rate of new infections.**

**Well, OK, what happens if we just take vaccinations alone.** Surely the larger the fraction of the state population that has been vaccinated, the lower the new case rate should be?

**Nope, not that either. At this point, there’s no relationship.**

Well, observational data can be like that sometimes. There are a lot of other things going on that are driving these rates. For example, having a high current case rate might spur more people to get vaccinated, or spur the state to take the vaccination more seriously (and so, you have reverse causation from high infection rates to high vaccination rates.)

The only point here is that there’s no simple, state-level analysis that will demonstrate a negative relationship between current population immunity and current population infection rates.

Obviously, this is pretty simple and pretty crude. But the fact is, if it takes a complex analysis to pull that effect out of the data, a) you’re never quite sure what your methodology is doing and b) people have to take the results as a matter of faith. (The latter is particularly true when no simple analysis will show that result).

My conclusion is that either we’re not near the herd immunity level yet, or that there’s no practical way to demonstrate that with a simple state-level analysis.