Source: Plotted from data from the NY Times Github COVID data repository. Data reported through 12/26/2020
Edit: You can now see this clearly, with 20-20 hindsight, in (e.g.) Post #941. The holidays do, in fact, put a significant dip in the reported infection count.
Holidays introduce several types of artifacts in the data on new COVID-19 cases.
There’s an immediate “reporting” artifact. Many public health departments are short-staffed on the holiday, and they aren’t able to tabulate all the new COVID-19 test results that arrive on the holiday itself. That creates a sharp one-day dip-and-rebound in reported rates. We saw that at Thanksgiving, predicted in Post #901, confirmed just post-Thanksgiving in Post #909). And, as above, we’ve now seen that same pattern for Christmas day.
There are other artifacts, but they will be more subtle than that, and harder to spot. Presumably, there’s a slowdown in the actual rate of testing (because who goes out on Thanksgiving to get a COVID-19 test), and that shows up as a dip in the rates a few days later. Finally, there’s the actual “surge” — if any — the actual increase in infections due to holiday travel and such, that shows up anywhere from 12 days to three weeks after-the-fact.
This post is about an odd discovery that I made when try to smooth out the Christmas reporting artifact, shown above. The discovery is that there isn’t a simple offsetting dip-and-rebound in the reported rates. The rebound isn’t as big as the dip. There’s actually a small, permanent one-off reduction in the number of positive cases found, associated with that holiday day. True for Christmas. And, in hindsight, true for Thanksgiving as well.
It’s as if some people who would have tested positive just never bother to get tested. Presumably, because of the holiday. And never get tested afterwards, to make up for it. Presumably because, eh, they probably don’t have a very bad case of COVID-19. And so, apparently, just deal with their COVID infection.
This is no more than an odd footnote. My real goal here was to talk about trends. But, in fact, I just have to let the Christmas data glitches work their way through the system before I can talk about trends again.
A small amount of detail follows.
A couple of quick attempts to smooth the artifact.
I spent a lot of my career dealing with bad data. In that context, dealing with these short-term reporting artifacts is a piece of cake, because I know what’s generating them.
Or so I thought.
The biggest problem with the artifacts — the √-mark ends of the lines above is that they put the last data point in the wrong place. Intellectually, you may realize that √-marks are reporting artifacts. But your eye can’t ignore them. Your eye judges a trend based on where we were, versus where we ended up. And the √-marks — even though we know they aren’t really the true trend — fool the eye.
And so, my goal is to put together a simple way to smooth out the ends of the lines and get the ends roughly where they should be. I want the very end of the line — the last data point — to be (and look like it’s) in roughly the right place.
We can try the simple-minded approach of just averaging 12/25 and 12/26. The idea being that the under-reporting of 12/25 is made up by the over-reporting of 12/26. But as you can see below, that just replaces the check-mark ends of the lines with a straight line segment. That looks worse than the original.
Above: A bad attempt at smoothing the data reporting artifact.
Instead, I note that trends typically continue from day to day. That is, the inflection points on these curves are relatively few and far between, and most of the time, whatever the trend was yesterday, well, that’s usually today’s trend as well, plus-or-minus.
It takes just a bit of algebra to use the trend for (say) the 23rd+24th, versus the 25th+26th, to allocate the counts across the 25th and 26th. This converts the end of the line into a straight-line continuation of whatever that two-day trend was.
By eye, that looks a little “over-done”. The ends of the lines are too low. It looks like there’s a sudden, simultaneous acceleration of the downward trend.
And that’s because the (simple) sum of new cases/100,000/day, by state, for the 25th and 26th is, in fact, quite low. It’s about 10% lower than one would expect, based on the just-prior trends (from the 20th to the 24th).
What’s more: 1) That 10% never returns. I never get a day with an excess of 10%, to make up for that 10% shortfall on the 25th and 26th combined. And 2) This also happened at Thanksgiving, I just didn’t notice it.
My conclusion is that this isn’t just a reporting artifact. With a little more analysis, I conclude that 1) There was not some huge rush of testing prior to the holiday. 2) To the contrary, it looks like maybe 10% of people who would otherwise have gotten tested just skipped it, in the days just before the holiday, and 3) That shows up in low counts for the 25th and just-following days, independent of down-and-up data reporting artifact on the 25th and 26th.
And the same thing happened at Thanksgiving.
So that’s an oddity. I can’t really even guess why that happens. But I’m pretty sure that it happens. Right at the holiday, and for a few days afterward, there’s an actual, real reduction in newly-diagnosed COVID-19 cases. For Christmas, that was about 10%. I can only guess that people with a mild case, who would otherwise have gotten tested, simply carry on with their lives. But that’s just a guess.