Post #1067: Updated information on COVID-19 variants of concern, comparison of CDC and Helix Corporation data.

Posted on March 20, 2021

This post is about what I believe to be a new page of information from the CDC regarding COVID-19 variants of concern.  Either the CDC just recently began posting this information, I have managed to miss it earlier.

As of today (3/20/2021), the CDC has an entire page showing the prevalence of COVID-19 variants in the U.S., based on four weeks of data ending 2/13/2021.  Clicking this link will take you to that page.  I think that’s new.  In the past, the CDC would only show a count of cases.  That didn’t tell you the proportion of cases that was attributable to each COVID-19 variant.

The CDC data are useful, but the huge drawback is that they are ancient, in this context.  That’s because the growth rates for these variants are so high.  For example, the U.K. variant was reported to double its share of all new cases every 10 days.  If that were true, and if that were constant, we’ve now seen four doubling times since the mid-point of the CDC data collection period.  And the U.K. variant share would therefore be (2^4 = ) about 16 times higher than the share of cases shown by the CDC.

Compare CDC to Helix

The exact age is important, because I want to try to square the CDC web page with the information I have been tracking from the Helix Corporation.  The Helix data, recall, provide an extremely timely snapshot of incidence of the U.K. variant.  But the sample Helix uses is not a systematic sample of the U.S., and in fact omits many states entirely.

Note that the image below is live-linked to the CDC web page, so while this will read correctly today, it’s not going to read correctly if the CDC updates this.  (But this blog page is pretty much disposable anyway.)

Source:  CDC

As I work my way through that chart, I’m pretty sure that each bar represents two weeks of data.  So that the last bar represents tests taken between 2/14/2021 and 2/27/2021, inclusive.  However, that last bar is provisional data, subject to revision, so the CDC actually used the two bars next to that, combined, for the table of numbers on the right.

Upshot:  Those numbers, in that table, are from tests obtained (collected) on 1/17/2021 to 2/13/2021, inclusive.  And from those, the CDC concluded that the U.K. variant (B.1.1.7) accounted for 2.6% of new cases.

Data sourced from the Helix® COVID-19 Surveillance Dashboard. Accessed at on 3/20/2021.

If I take the relevant days from the Helix dashboard , take five dates evenly spanning that range, and average them, I come up with an estimated U.K prevalence of just about 3.7% of new cases.  (But there would be a pretty wide standard error around that.)

Assuming the new CDC data are the gold standard, this means that the Helix dashboard sample overstates the actual prevalence of the U.K. variant by a factor of about 1.4.  (With considerable uncertainty around that figure).  Or, to put that in growth terms, it’s about 5 days “too fast” compared to (what should be) the actual cross-section of the U.S., assuming a 10-day doubling time for the U.K. variant’s share of cases.

Finally, comparing the last two bars on the chart above, by eye, you can see that the B.1.1.7 (U.K.) strain slightly more than doubled its share, during the 14 days that separates the two sampling periods.  That’s just about what you’d expect, based on the doubling period.  And so, the estimate shown in the table above isn’t a one-time fluke, but appears to be consistently replicated across the sampling periods.

The upshot is that, for estimating the current national incidence of the U.K. variant, the Helix data are not an exact match to the newly-published CDC data, but they are close enough.  By relying on that, my timing might be a few days “too fast” compared to the presumed gold-standard CDC.  But you actually get the data out of Helix roughly a month faster than you do out of the CDC.

For example, the Helix data currently show a point estimate for 3/17/2021 of 50% of U.S. cases being the U.K. variant.  (That’s a bit of a one-day outlier, so I’d be temped to knock that back to maybe 47%, by eye.)  By contrast, the midpoint of the provisional data on the CDC web page is 2/20/2021, or almost a month earlier.

But the upshot is that my simple model, which runs a little faster than the Helix data, is probably running just over a week too fast, compared to the (now) gold-standard CDC estimate.  That’s still good enough for the amateur epidemiology that I’ve been doing here.

The comparison of the CDC data and the Helix data by state is a) a lot more ragged, b) a lot of work to do well, and c) probably isn’t going to work out well, due to the skewed coverage of states by the Helix set of COVID-19 test sample.

Both sources agree that incidence of the U.K. variant appears high in Florida.  But, for example, the new CDC data (which actually date to a midpoint of about 1/30/2021) show that New Jersey has (had) a higher incidence than Florida.  By contrast, the Helix data has far too few samples from New Jersey to be able to provide an estimate.

That said, let me do one last quick comparison between the CDC’s U.K. incidence figure for Florida (8.6%, from tests spanning 1/17/2021 to 2/13/2021, inclusive.) If I use the same five-sample-days method with the Helix data, I come up with 8.9%.  Given the standard errors here, that’s pretty much an exact match.  So where Helix has data, the published numbers appear to be a good match to CDC.  Plausibly, that’s because the CDC number in those places is, in fact, based largely on the Helix samples.

Source:  CDC