I’ve been trying to make some sense of the current spread of infection of COVID-19 in the US. I have to admit that’s been more-or-less a complete and total failure. I’ve managed to turn up a lot of interesting facts, but nothing resembling a pattern.
So, for this post, I’m just going to explain briefly why that is, and then start arranging a list of states in some sensible format. I’m doing that because, as far as I can tell, most of what gets written up in the popular press is un-usable, from the standpoint of actually understanding what’s going on.
You’ve seen a lot of popular press articles recently on rapid growth of COVID-19 in many states. By my estimate, those articles are only partially true. For one thing, they ignore places where there’s been “bad behavior”, but no problem (Post #709). From a scientific standpoint, those area are every bit as interesting as the ones where case counts are growing. For another, they commingle areas where there is a real threat to public health emerging (Arizona) with areas that are, really, no worse off than Virginia is (Texas).
The upshot of this analysis is my list of states and the level of COVID-19 problem that they currently face. I think everybody has it right in that Arizona, pretty much all-of-a-sudden, is facing some serious problems. But other states that are featured in the popular press do not — certainly not if my standard is how things stand currently in Virginia. Which, already noted (Post #715), seem to be heading in the right direction now. For no particular reason that I can put my finger on, other than a generally reasonable population, and good weather.
What makes this so difficult? Why is there no clear pattern across the states?
In the main, what makes any analysis of COVID-19 so difficult is that there are no natural constraints on the disease. Nobody has immunity and the number of persons infected so far is a tiny fraction of the population.
As a result, the US population is just one big wide-open field. And as far as i can tell, what you observe in one state versus another is largely a matter of happenstance. E.g., it just so happened that nobody in Montana had traveled to China recently when the pandemic was starting. But lots of people in New York had done so. And so, even now, infection rates in New York are vastly higher than in Montana. That doesn’t reflect any truly fundamental difference between the New York State and Montana populations. More-or-less, that’s just an accident. There is no more fundamental explanation.
What does a good epidemiological study look like?
In short, a good study is one that actually gives you some explanation of why things are happening. That’s possible because most studies focus on diseases that are well-established and stable in the population. And, typically, diseases where there is some clearly defined cause-and-effect link that you can hope to clarify. Even if that link isn’t obvious.
A classic epidemiological analysis is the analysis of state-to-state variations in skin cancer. There, the cause of disease is relatively well-established as lifetime sun exposure, tempered by skin color. Briefly, the paler you are, and the more time you spent in the sun, the higher the odds that you’ll develop some sort of skin cancer late in life. And once you have all those factors on the table — average sunlight, altitude, and racial mix, most of the variation in skin cancer rates makes sense.
Here’s an example I did years ago regarding oxygen use. It took a while to uncover the cause-and-effect, but in the end, despite huge state-to-state variation in home oxygen use, the variation made some sense.
In the Medicare program, virtually the only reason for prescribing home oxygen is for chronic obstructive pulmonary disease (COPD). And yet, Medicare’s spending for oxygen, across the states, seemed to have little to do with COPD. There’s five-fold state-to-state variation in spending, with only a vague hint of some link to the disease that the spending is supposed to be for.
At some point, I was asked to look at this issue. My client — a major manufacturer of oxygen equipment — was not particularly happy with the answer that the US DHHS Office of the Inspector General (OIG) seemed to be leaning toward, which boiled down to “fraud and abuse” answer. Not that that’s a bad answer, in general, in the US health care system. My client simply did not think it was true.
The first step in any such analysis is to sort the data a few ways, and see if anything catches your eye. After a while, it dawned on me that many of the high-spending states were in the US Mountain region. Like so:
Rocky mountain high? Mile High Stadium?
It finally dawned on me that the primary driver of Medicare oxygen spending was altitude above sea level. For a given COPD population, Medicare was spending a lot on oxygen in some states mostly because there wasn’t muich oxygen in the air. Like so, a nice straight line:
Together, elevation and COPD prevalence explained nearly all (85%) of the state-to-state variation in oxygen spending. That was enough to convince the relevant authorities that the state-level variation on oxygen spending wasn’t the result of massive amounts of fraud and abuse, but instead was (mostly) the result of natural forces.
And that is what a successful epidemiological study looks like. You actually get an answer, and it appears to make sense.
And nothing about COVID-19 yields anything that looks like that. Nothing. Not the states that have high current prevalence. Not the states where it’s growing rapidly and not. To me, so far, the lineup of the states looks like one big accident.
Putting the state data into a sensible format.
The first step in trying to make sense of things is to put the state data in some uniform format. Something more than just a simple count of cases. At the minimum, put the new case counts on a per-capita basis.
You’ve seen a lot of popular press articles about how many states are having increases in new COVID-19 cases. And yet, much of that seems to come from picking those states after-the-fact. The popular press looks at the states with rising incidence, and talks about them. It ignores the ones where behavior was superficially similar, but there is no increase in cases.
And as a result, you don’t hear about the Eat Cheese or Die states (Post #709). Those who somehow skated past an expansion of the pandemic. You only hear about the ones where trouble is brewing.
That’s good reporting, but bad science. That’s why I have insisted on running an analysis of a fixed, predetermined list of states. This list was, at one point, the New York Times list of states that re-opened early and states that did not. Up until my last analysis, there was no difference between these two sets of states. I did that last a month ago, and posted the results in Post #694. Here’s how that contrast looked three weeks ago (updating Post #694.)
But nobody was seeing any impact three weeks ago. Let me start by redoing that, updating to yesterday’s counts, and extending the time periods over which I compare those two groups of states. I’m going to drop the pre-opening period (during which the two groups of states appeared similar).
And now, when I look over three time periods — May 1 to present, May 15 to present, and June 1 to present — sure enough, there’s a modest average difference between the two sets of states. In the late-reopening states, on average, they’ve fallen to about a 1.3% per day increase in cases, over the past two weeks. In the early-reopening states, by contrast, there’s been no improvement in new case growth. In fact, the closer we get to the current day (here, June 14) the more their case growth rebounds.
But that doesn’t come close to telling the full story because there is so much variation among the states. The problem is, much of what you see tabulated by state is useless from the standpoint of understanding where there are and aren’t problems.
First, it makes no sense to show the raw count of new cases, for the simple reason that some states have a lot more people than others. So, in the table below, I should you the new cases per capita, from the last week of data, annualized. In other words, the first column of numbers shows what fraction of each state’s population would be newly infected over the course of a year, if the currently-observed rate of new infections remained constant.
Second, you want to know whether or not that situation is getting worse. So the second column shows the change in that annualized per-capita infection rate, comparing those week to the week prior. Positive numbers flag states where the new infection rate (on a per-capita basis) is rising. Negative numbers then (obviously) show where it’s falling.
Finally, as a kind of overall “misery index”, the last column of numbers is the sum of the first two. In effect, that’s a crude projection of where that state will stand, next week, if it sees the same change in new infection rate as it saw this week.
I have sorted the data by the “misery index”, descending. States at the top would appear to have a serious problem, in that they have both high and rising new infection rates. States at the bottom either have a low rate of new infections per capita, or their rate of new infections appears to be falling rapidly.
To me, this table gives me the kind of ordering of the states that I was looking for. For perspective, I’ve highlighted Virginia in green. While we have a fairly high rate of new infections per capita, that’s now falling so fast that you would not project big problems here in a week or two.
And when I rank the states this way — that makes some sense to me — I get a somewhat different picture from what you may see in the popular press. Yes, Arizona has a real problem, with a high and rising infection rate. This has made the news. But Alabama is not far behind. And yet, it’s hard for me to think of two states that are more different than those, in terms of climate and population.
Note that North Carolina is near the top of the list. But in terms of climate, I’d say North Carolina has a climate nearly identical to that of Virginia. So if there is any seasonality to COVID-19, something about human behavior or biology in North Carolina is completely offsetting it, compared to Virginia.
And in general, when I look at that list, I see — nothing. No rhyme or reason whatsoever. Both New York and California were hit hard by this, early on. New York is now seeing a relatively low and falling rate of new infections, California is seeing a relative high and rising rate. So far, I see no way to make any sense of this. Nothing comparable to the analysis of oxygen use.
From macro to micro: No denominator means no way to know the odds.
Consistent with that scrambled picture at the macro level, there appears to be no way to determine what’s risky and not at the micro level.
As you plot your return to a more normal existence, you’d like to have some idea of what’s a high risk situation, and what’s a low risk situation, in terms of likelihood of contracting COVID-19. Wouldn’t we all. The need to know your risk is now so mainstream that an entire Washington Post article was devoted to this idea, this past week: How can we tell what’s risky and what’s not?
Having thought long and hard about this over the past week, it’s clear to me that we are never going to get any precise estimate of just how risky most real-life situations are.
And the reason for that is simple: No denominator. The risk of any activity is the number of persons who get infected through some activity (numerator) divided by the number of persons who engaged in that activity (denominator).
In an ideal world, we might plausibly get some notion of the numerator — the number infected through some activity. Under no stretch of the imagination are we likely to get any good estimate of the denominator — the total number of people who engaged in that activity. At least, not for most common activities.
Let me just take a simple example: Indoor dining in the Town of Vienna. Plausibly, through contact tracing, we might be able to identify which new COVID-19 infections in Vienna were contracted in restaurants. Plausibly, the Commonwealth might even make such information public. (It does not do so now.) So that’s information that is not now made public, but could (in theory) be known. In theory, we could know the numerator in our risk calculation.
But to know the risk of eating in a Vienna restaurant, you also need to know the denominator: How many people, in total, ate in a Vienna restaurant that day? And that’s a piece of information that nobody gathers, and that nobody has access to. That’s where you just have to go totally by the seat of your pants.
For example, suppose you want to know the risk of (e.g.) going to your dentist for a routine checkup. It’s not enough to know that (e.g.) N people appear to have gotten infected, via routine dentistry, in the past month, in Virginia. (The numerator). To know your odds, you have to know how many people went to the dentist. (The denominator). And that’s something that literally nobody can tell you. Even making a crude guess, in today’s crazy climate, is probably out of the question.
All you can do is identify where new cases are and are not coming from. And kind of wing it from there.
So far, US epidemiologists have been remarkably, almost astonishingly unhelpful, in providing any guidance whatsoever on where people are picking up their infections. That is, we here in the USA can’t even get information on the numerator, let alone the denominator. Nothing about the state-level data seems to make sense, and nobody seems to be stepping up to put it into any context that makes sense.
Certain health care settings are tracked, sure. So you can, in theory, estimate the odds if you are (e.g.) living in a nursing home.
But for the rest of us, living our daily lives, the Commonwealth (and I would say US epidemiology in general) has provided little in the way of useful information on what we should and should not be doing. Maybe that’s just the way things are. Maybe nobody has quite figured this out yet, but somebody eventually will.