Post #964: Virginia did, in fact, change some data reporting on 1/17/2021

Posted on January 19, 2021

Source:  Calculated from  NY Times Github COVID data repository and Commonwealth of Virginia COVID dashboard.

Red line is seven-day moving average excluding data for 1/17/2021Blue line is seven-day moving average excluding data for 1/17/2021 and 1/16/2021.  The reason for considering those exclusions is given below.

I got a very nice and very detailed reply to my inquiry to the Commonwealth regarding the unusual uptick in cases in Virginia this past week (Post #961).

There was a reporting artifact in the Virginia data on 1/17/2021.  The overall size of that is not completely clear.  But that plausibly explains the jump in cases that occurred on 1/17/2021, and that in turn explains what I noted for Prince William County and Manassas.

How much of the overall uptick for the state is an artifact of reporting, and how much was real, remains an open question.  Clearly, as the Commonwealth’s respondent emphasized, most of what occurred in Virginia in this period is true, ongoing new cases.

The story is that some cases had not been entered into the system, pending “investigation” (which I interpret to be contact tracing, but I’m not sure).  That was due to lack of staff to investigate the cases.  On 1/16/2021, Virginia implemented a workaround that allowed them to include those cases in the reported data.  The upshot is that there was some backlog, of some undetermined size, that was entered into the data circa 1/17/2021.

A secondary factor is that Virginia labs are moving away from paper-copy reporting of tests, toward electronic reporting.  This shortens the lag between testing and reporting in the Virginia figures.  I’d have to guess that’s a far smaller effect over the short timeframe of interest here.

I’m still not quite sure of what’s going on in Chesapeake.  We sort of talked past each other there, as I emphasized the uniformity of the ZIP code data, and what I got back was focused on the irregularities of ZIP codes.  So I’m still not quite sure why cases in Chesapeake appear to have quadrupled in a week, but plausibly that’s just natural variation combined with a big count on 1/17/2021.

Given that the outlier 1/17/2021 increase reflects an artifact of some unknown size, I feel reasonably comfortable showing what the resulting trend in daily new cases would look like without that.  That’s the red line — that’s a seven-day moving average, excluding 1/17/2021.

In fact, all of the numbers 1/16 – 1/18 appear high, but at some point, if I start dropping all the high daily counts, that’s just cherry picking.  But, plausibly, dropping just the one may not fully remove the reporting artifact.  So I did a second re-calculation, excluding both 1/17 and 1/16, yielding the blue line above.

If I had to guess, given how closely the Virginia line has mirrored the DC and Maryland lines for the past few months (with some temporary departures), I’d actually bet on the blue line being the better proxy for the long-run path of new cases.  That, plus the fact that every other state in the U.S. shows a downward trend (other than South Carolina, see next post).

Unfortunately, because I use seven-day moving averages (as does everybody else), if the new-case rates return to mirroring MD and DC, what you’re going to see in my graphs moving forward is a seven-day-long “speed bump”.  And then the Virginia line will drop back down toward the DC and MD lines sometime around 1/24/2021, as the seven-day moving average finally moves past the 1/17/2021 outlier.

Anyway, my opinion is that the “Near 10,000 cases” day that was widely reported in the media is probably due mostly to this data reporting issue.  Again, in my opinion, odds are that the true recent trend in Virginia closer to what you observe in MD and DC than to the black line above.

I was given the advice to look at cases by date of symptom onset, rather than date of report, but I’m going to ignore that.  Near as I can tell, a) that wouldn’t solve the problem, b) nobody looks at cases by date of symptom onset, and c) that’s for good reason.  That graph always shows cases tailing off at the end, due to the lag between symptom onset and recording in the system.  It gives you a good solid estimate of what the new case count was last week.

This is not an error in counting cases, per se.  Those were real infections.  And, Virginia did, in fact, show those counts on the day that the case was reported (entered into their system).

But that’s true purely by definition — they count them when they count them.  And, in fact, the numbers have see-sawed from day to day due to apparent variations in the completeness of processing.  That was evident early on, as I split the data between the early-reopening and late-reopening counties in Virginia.

In the past, when Virginia added in a backlog (e.g. , added in two-days’-worth of cases in a single day), they clearly noted that on their website.  To see a writeup of the last time this happened, see Post #765 That was helpful, even if it didn’t entirely prevent misinterpretation of that “spike” of cases in the popular press.

But this time, tossing in a backlog of cases and not noting it resulted in a lot of sensational headlines in the popular press.  (For example.And worse, in some reporting, that increase was attributed to the non-existent post-holiday surge.

I’m undecided as to whether or not to write back and point that out.  I think the sane portion of the population is sufficiently concerned about the pandemic as-is.  The lack of explanation, combined with the press writeup, left a lot of people unnecessarily wondering whether the pandemic was now out-of-control in Virginia.