Changes in state data reporting have made it increasingly difficult to get a good estimate of the most recent trend in COVID-19 cases. Starting today, I’ve made some imputations — described in the last section below — to account for the worst of the state data issues.
With those in place, the upturn in daily new COVID-19 cases in the U.S. is now quite clear. As expected (Post 1160, June 15 2021), we now seem to be tracking along the same path that Great Britain took following the spread of the Delta variant.
Source for this and other graphs of new case counts: Calculated from The New York Times. (2021). Coronavirus (Covid-19) Data in the United States. Retrieved 7/1/2021, from https://github.com/nytimes/covid-19-data.” The NY Times U.S. tracking page may be found at https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html.
It may take mainstream media a few days to catch up to this, because state data reporting issues in the “raw” data, if not corrected, will muddle this uptick for several days yet. That said, by Tuesday or Wednesday of next week, this should be common knowledge.
This latest wave is being led by Missouri — the state with the highest proportion of Delta variant among new cases — and adjacent parts of Oklahoma and Arkansas. But if you look at the second graph, you can see that in this past week, all the South Central states except one (Tennessee) started on an upward trend.
It’s not completely clear, but there appears to be another geographic cluster of upward trends in Nevada, Utah, and Wyoming. These states don’t appear on the CDC list of states with a lot of the Delta variant, but that may be simply due to small sample size. (I.e., the CDC simply didn’t report the data).
If you take the growth in new cases over the past two weeks and flag the high-growth states, there does appear to be some distinct geographic clustering. The South-Central and Southwest regions are the high-growth areas at present, with a few exceptions. Those are shown in red below. The northern tier and the U.S. east coast remained generally low-case-growth areas.
Nobody ever cares about the methodology, but sometimes methodology matters.
There have been “glitches” in the data for the entire time I have been tracking these numbers. Historically, those mostly arose when states made large one-time adjustments to their case counts, and either added or subtracted a large number of cases on one day.
For example, a state might clear up a backlog of old cases, and report all of those old cases on a single date. Or they might identify a bunch of duplicates, and remove all of those from their case counts on a single date. I’ve referred to these as “speed bumps”, because those big lumps of old cases perturb the seven-day-moving average for one week. After which, the seven-day-moving-average returns to the true trend. The resulting trend looks like a road with a speed bump in it.
I’ve been fixing those all along, where I could identify them. If I found out that a state (e.g.) added 5000 old cases on some date, then I netted out 5000 cases before proceeding to calculate trends.
For today’s analysis, I needed to fix California’s data, as they netted out several thousand duplicate cases. I netted them back in to give a true view of the trend in new cases per day.
In addition, for some time now, a few states did not report data on Sunday, and would then report two days’ worth of data on Monday. This was a minor issue, as all that does is create a one-day “flat spot” on that state’s curve, followed by a little jump up or down to return to the true trend.
More recently, though, data reporting has been getting spottier. Currently, 27 states report no data on the weekend. That generates a two-day “flat spot” on their curves, followed by a larger jump back to the true trend. And a handful of states report even less frequently, including Florida, which now appears to be reporting data just once a week.
For older data — more than a week old, say — things eventually catch up to where they should be, on average. All the new cases are finally reported, at some point. The curve might needlessly fluctuate up-and-down, but on average, it’ll be where it’s supposed to be.
The important issue is that this spotty reporting frequently distorts the last few days of the trend curve. To the extent that a state has reported one or more days of zero new cases, but has not yet reported the large “catch up” day that follows, the cases are missing. The seven-day moving average dips down. The upshot is that non-reporting days at the end of a given time period result in understatement of the true trend.
This has now become enough of an issue that I needed to fix it. And so I have. For this set of trends, and from this point forward, I’m imputing any missing reporting that occurs at the end of the time period in question. I’m imputing it using the last known “good” seven-day moving average — the last seven-day moving average that occurred on a day on which the state reported a positive number of new cases. The upshot is that if a state doesn’t report data, I’m assuming that the new case rate continues along as it had been, until such time as the state begins reporting data again.
This fixes all of the important problems, starting with Florida. It does not fix them in the historical trends data. Florida’s line is going to wave up-and-down in the older data, but not for the current week. The important thing is that this imputation prevents the end of the trend line from understating the current trend due to missing data from states that no longer report seven days a week.