Post #1240: Virginia vs. CDC COVID-19 hospitalization data, failure to read the footnotes.

Posted on September 9, 2021

This is a note is for readers who are genuinely interested in COVID-19 data.  As such, I’ll consider it a note to myself.  As well as a cautionary tale about what happens when you fail to read the footnotes associated with the data.

For COVID-19 new hospital admissions, there is a large discrepancy between what the CDC shows for Virginia, and what the Virginia Department of Health shows.  Currently, Virginia shows about half as many daily new COVID-19 hospitalizations (in Virginia) as the CDC does.

That’s it.  That’s the sum total of this post.

I noticed that something was odd when I tried to calculate the hospitalization case rate for COVID-19.  Virginia’s current rate, based on Virginia data, is vastly lower than the U.S. average, based on CDC data.

I finally put the two data sources head to head, generating these two graphs below.  The CDC data are from the CDC COVID-19 data tracker.  The Virginia Department of Health data are calculated from their summary dataset, after cleaning it up a bit for some duplicate lines and missing dates.

In this first graph, the two thin lines are the count of new cases.   The two thick lines are the counts of new hospitalizations.  The thin lines match, the thick lines diverge.

I would like to say that they have recently diverged, but that’s not true.  The fact that they match at the start of the graph above appears to be merely by chance.  Below, for 2021, Virginia consistently under-reports hospitalizations, relative to the CDC data, for everything but that period.  (Whereas the new COVID-19 case counts — the thin lines — are a pretty good match throughout.)

Note that this is not a small discrepancy.  Currently, Virginia shows half as many hospitalizations as the CDC does.  That is probably why Virginia’s case hospitalization rate is so far below the U.S. average, when I calculate it with Virginia’s data.

I will also note that the CDC’s data look better, to me.  They more closely match the rise and fall of the new case counts.  That’s particularly evident in the top graph.

I realize the hospital data come from two separate sources.  CDC is using a US DHHS dataset under COVID-19 data reporting that was (I believe) mandated as a condition of hospital participation in Medicare and Medicaid last year.  Virginia is, I think, relying on reporting via the Virginia Hospital and Healthcare  Association.  That’s a voluntary association for Virginia hospitals, and any data reporting requirements within it cannot possibly have the sort of “force of law and regulation” that lies behind the US DHHS reporting requirements.  Edit:  Nope.  See below.

Anyway, mystery solved, sort of.  At first glance, it looked like Virginia’s health care system was crazily better than the U.S. as a whole, at keeping COVID-19 cases out of the hospital.  Which made no sense.  The correct answer is that Virginia’s COVID-19 hospitalization numbers are crazily low, compared to what the CDC reports.  And that’s almost surely the explanation of why Virginia’s apparent COVID-19 hospitalization case rate is such a low outlier relative to the U.S. as a whole.

Addendum:  Failure to read the footnotes.

I try to gather the facts.  But only as a last resort.

After I ran this, I squinted a little harder at the Virginia COVID-19 dashboard.  And it says this:

The upshot is that Virginia gets their count of hospitalizations via their own case investigation system.  Virginia Department of Health knows that their counts are an undercount.

I had assumed it there was a potential for modest undercount.  I didn’t realize that, compared to US DHHS, they miss half the cases.

Anyway, I even understand why Virginia continues to use these case counts.  If you go back a year and a half, there was no US DHHS COVID-19 hospitalization data set.  At that time, Virginia then used the best available data they could get.

Now, I think that the presence of a more reliable count from a Federal data collection system makes the Virginia data obsolete.  But it’s tough to swap data systems, and to swap data reporting.  So, in some sense, they are stuck with having to continue as they started.

The only truly new information here is how far off the Virginia hospitalization counts are.  Going forward, for any analysis of hospitalizations in Virginia, I’ll download the US DHHS data, assuming I can extract the Virginia numbers from that as CDC does.

I now believe the gold standard for counting U.S. COVID-19 hospitalizations at the state level should now be this file.

This is the US DHHS summary of its own hospital data, appears to have a daily update, appears to show new COVID-19 cases, and is summarized to the state level.  I think the fields showing counts of COVID-19 cases (confirmed or suspected) admitted the previous day would be the count of record.