Post #1524: Finally cracking the numbers on the supposedly vast number of long COVID cases

Posted on May 31, 2022

 

If you take the available research at face value, a huge fraction of the U.S. population now suffers from debilitating long-term consequences of COVID-19.

Just last week, a CDC-sponsored study (reviewed below) seemed to imply that 13 percent of the entire U.S. adult population has some form of long COVID.

(They estimated that ~22 percent of persons who had COVID had some long-term health consequences, above-and-beyond the baseline rate at which those conditions arose in the non-COVID population.  (Literally, 16% of non-COVID patients had some mention of a relevant health condition,versus 38% of COVID-19 patients, for a difference of 22%. When multiplied by the most recent CDC seroprevalence estimate that almost 60% of U.S. adults have had COVID, you would come to the conclusion that (22% x 60% =~) 13% of the U.S. adult population has long COVID.)

One-in-eight U.S. adults has long COVID? 

Really?   This is very much at odds with my perception of reality. 

Surely there are some cases of serious long-term health consequences from COVID.  To be clear, any acute condition that nearly kills a person will likely result in some long-term effects.

What I don’t believe is that serious long-term health outcomes from COVID infection could possibly be that common.  If so, U.S. health care rehab facilities, outpatient departments, and physician offices would be overflowing with followup care for COVID survivors.

So, what’s the answer?  How can a seemingly-legitimate CDC-sponsored study find such huge fraction of COVID-19 cases have “long COVID”, and yet, there appears to be no proportionate impact on the U.S. health care system.

I think the answer has two parts.


The CDC analysis is a study of COVID-19 cases sick enough to require hospital-based intervention.

First, if you read it closely, that CDC study is clearly a study of U.S. adults who were so sick with COVID-19 that they sought medical intervention.  The underlying data are from electronic medical records.  Such records are only generated when individuals use the health care system in some fashion.  Aside from a handful of individuals who might have gone to their health care provider to get a COVID-19 test, that means we’re looking (almost exclusively) at people who sought treatment for their COVID.

But it gets worse.  This study was based on a proprietary database of electronic medical records.  You then have to pursue some description of what, exactly, is in that database.  Unfortunately, the descriptions of the proprietary data source are marketing-oriented, not technically-oriented, but a) the data underlying this study come almost exclusively from hospital systems, and b) it certainly appears that the overwhelming majority of the information is from hospital inpatient and ER encounters.

In other words, this most recent CDC study is primarily a study of U.S. adults who had such a bad case of COVID-19 that they sought help at a hospital. 

To be honest, the underlying data source is just sufficiently ambiguous enough that that’s not an air-tight claim.  But that’s certainly how it appears to me.

Assuming I’m correct, now the results make sense.  Sure, a high fraction of those who had a severe or life-threatening case of COVID-19 have significant long-term aftereffects.  That’s completely unsurprising, because almost anything that results in a near-death experience (think stroke or heart attack) will have serious long-term consequences.

But, as will become obvious from the numbers below, under no circumstances should you interpret the latest CDC study as showing outcomes for the typical adult COVID-19 case. 

And, to be clear, I think it’s hugely misleading of the CDC to fail to make that crystal-clear.  Because that’s exactly how the popular press is reporting it.  And that’s exactly how U.S. citizens are going to be interpreting it.  And the last thing the CDC needs, in the context of COVID-19, is to be caught out at what appears to be one last round of scare-mongering.  Even if, technically, they have done the appropriate amount of vague CYA (caveats) at the end of the research paper.

A quick walk through the numbers that almost everyone ignores

I have read and done a lot of health services research over the years.  I have learned the hard way that in some cases, the most important numbers are buried in the footnotes.  In particular, there’s a reason that most scholarly journals will make you show exactly how you got down to your final sample of data.  What did you start with, whom did you throw out, and how did you justify throwing them out.

And in this case, looking at the starting population is the key to understanding these results.  This CDC study ends up with about 350,000 U.S. adults with COVID.  But the kicker is, that’s drawn from database of >63,000,000 U.S. adults, a fact that is just casually mentioned in passing in this research.

This, over a time period when 31% of U.S. adults had a case of COVID-19, based CDC data.

Those of you capable of doing some crude arithmetic on the fly will realize that magnitude of the discrepancy here.  That 350,000/63,000,000 ain’t nowhere near 31%.  It’s not even close to 3.1%.  It’s more like 0.5%.

So, from the get-go, without any further analysis, you immediately realize that this is a study of a tiny, tiny subset of adult COVID-19 cases.  I’ll do the math a little more precisely below but in round numbers, this is a study of less than 1 percent of all the adult COVID-19 cases that should have occurred in that population of 63 million.

But tiny is tiny no matter how you slice it. This ends up being a study of a tiny fraction of COVID-19 cases.

And my key point is that it’s  not some randomly-chosen set of cases.  This tiny subset consists of a) everyone who was hospitalized for COVID, b) probably most people who were sick enough to end up in the hospital ER for COVID, and c) an unknown number of others who sought formal medical care for COVID at a physician’s office, but required no more intense care.

In fact, I’ll just go ahead and say it.  Based on U.S. prevalence and hospitalization data, I would be unsurprised if this database contained much more than the COVID-19 hospitalized population.

Here’s the citation for the study.

Bull-Otterson L, Baca S, Saydah S, et al. Post–COVID Conditions Among Adult COVID-19 Survivors Aged 18–64 and ≥65 Years — United States, March 2020–November 2021. MMWR Morb Mortal Wkly Rep 2022;71:713–717. DOI: http://dx.doi.org/10.15585/mmwr.mm7121e1.

The CDC study starts from a proprietary for-profit database provider who claims to have medical records for 63 million U.S. adults.  If you then read the description of the data source and see the resulting research, it’s immediately obvious that this is a hospital-inpatient-centric database.  (See here, for example).  So they surely have hospital inpatient encounters, likely have hospital OPD and ER encounters.  And the extent to which they have anything beyond that is unclear.  But, for sure, if you know what to look for, virtually all the research shown on their website, using this database, is research about inpatient care.

The researchers then identified 350,000 adults with (what I interpret as) some health care encounter with a diagnosis of COVID-19, from the start of the pandemic through November 2021.  But in addition, they appear to have screened out about one-fifth of all otherwise eligible cases, due to some pre-existing condition.  Accounting for that, they would have found about (350K x 5/4 =) 440K adults with (what I assume to be) some COVID-19 related treatment.

Doing the long division, they appear to have found about 440,000 adults in their database, with COVID, in a claimed population of 63 million adults.   Or (440K / 63M =) roughly 0.7 percent of their adult population appears to have had a case of COVID-19.  Of which some where tossed out for having a relevant pre-existing condition.

But as of November 2021, the CDC’s seroprevalence surveys show that >31% of the U.S. adult population had already had COVID-19.

Source:  CDC COVID data tracker.

Now here’s where this gets just plain ugly.  As of November 2021, the CDC estimates that about 0.9% of the entire U.S. adult population had been hospitalized for COVID-19.   (Source, CDC COVIDnet, accessed 5/31/2022).  But of those, about 15% to 20% died in the hospital.  That would suggest that about (0.9% x 85% =~) 0.75% of the entire U.S adult population survived a COVID-19 hospitalization over this period.  That’s an estimate based on a sample of hospitals, but it’s the best estimate available for the U.S.

In other words, on the face of it, based on the case counts and underlying population as-reported, this study appears to be primarily, perhaps almost exclusively, a study of individuals who were hospitalized with COVID-19.  There simply are not enough cases in their database to account for anything much above-and-beyond the expected number of COVID-19 hospitalization survivors in a population of 63 million U.S. adults.

OK, now it makes sense.  If you told me that being hospitalized for COVID had a 20% risk of long-term consequences, then sure, I’d believe that.  Being hospitalized for any condition with a 20% in-hospital mortality rate is likely to leave you with some long-term consequences.  (Stroke, heart attack, …)

But under no circumstances should this research be interpreted as showing that 20% of typical adult COVID-19 cases result in some debilitating form of long COVID. 

This is so misleading, and being so commonly mis-interpreted, that I think the CDC ought to be burdened to issue a clarification.  Or, at the very least, require the researchers to go back into their database and calculate the percent of their sample of cases that was hospitalized with COVID.  If, as I absolutely expect, that turns out to be almost all of their cases, then they absolutely need to issue a clarification.

Otherwise, I’m just going to chalk this one up to a long stream of CDC fails in this area.  They screwed up the initial set of COVID-19 tests.  They refused to acknowledge that COVID-19 was spread via aerosol (airborne) spread.  They incorrectly assured Americans that we didn’t need to wear masks.  Only after their arms were twisted (by the head of the Chinese CDC, no less) did the CDC grudgingly encourage Americans to wear inferior cloth masks.  They were dead wrong about the importance of fomite transmission, leading to the entire U.S. going through what amounted to two years of hygiene theater.  And they still have never acknowledge the benefit of wearing N95 respirators, despite a completely normal and adequate supply of those in the U.S. now.

To which we can now add this study, uniformly misinterpreted in the popular press, as the coda on that masterful performance.

Otherwise — aside from that one tiny little issue — this is a pretty good study.  They flagged the presence of plausibly COVID-related diagnoses in the medical records.  They tossed everybody who had one of those as a pre-existing condition.  They only looked at individuals who had some use of the health care system in both of the two years of the study.  And so on.  Everything else was more-or-less up to snuff, as these observational data studies go.  The only problem I see is that, by default, and never clearly stated, this is more-or-less a study of persons hospitalized for COVID.  But that only little issue has huge implications for how these results should be interpreted.

Separately, having done (plausibly) thousands of small health-care-claims-based analyses in my lifetime, I can even guess the technical issues underlying this omission.  Electronic medical records are verbose, consume vast amounts of storage, and are just-plain-hard to process when you are talking about tens of millions of cases.  It’s a good bet that the researchers themselves did not have access to the raw data, but instead were given a “patient-level abstract” of the information.  Further, under typical HIPAA-driven data privacy rules, that abstract contained only the minimum information required to do the study.  The researchers probably had to justify every data element that they requested.  That would be the patient ID, maybe some demographics, and then a string of dates and diagnosis codes.  But not the site-of-service information.  In other words, I’d bet good money that the researchers performing this study are literally unable to tell what fraction of their COVID-19 diagnoses come from hospital inpatient stays.

One final extras-for-experts, or why this would require a little bit of care to untangle.  Medical record coding rules require that follow-up visits must be coded with the reason for the original health-care visit.  Here, a person who is seen in the hospital outpatient department, a month after discharge for a COVID-19 hospitalization, will have COVID-19 coded on that outpatient followup visit claim, and so will appear to have had an outpatient visit for COVID-19.  The only quick-and-correct way to estimate the fraction of non-inpatient COVID-19 cases in such as situation is to flag anyone with any mention of COVID, then subtract out all persons with any mention of COVID-19 on any inpatient claim.

Addendum:

I am (or was) an old-school health services researcher.  I worked on large health care claims (bills) databases.  The first thing I want to know about any data source is where it comes from.  What proportion of the claims are from hospital inpatient stays, from emergency room visits, from physician offices, from independent laboratories, and so on.

That’s the only way to get a handle on what you’re looking at.   In the traditional Medicare program, for example, there are about 20 physician claims for every hospital inpatient claim.  If you only look at the hospital inpatient claims, you get a vastly skewed notion of the state of health of the average Medicare beneficiary.  And the closer you get to that inpatient-centric view, the more skewed your estimates will be.

Near as I can tell, nobody involved in this database or this study seems to have bothered to ask that basic question:  what mix of health care encounters does this information come from?   Or, possibly, they may not be able to tell from their data.  So I have had to make some inferences.

Based on the counts, my guess is that this analysis is based primarily on hospital admissions.  But the real point, coming from someone with extensive experience in claims-based (bill-based) health services research, the reader should not be required to guess about such a fundamentally important question.


Other sources of confusion, or, what do inquiring minds want to know about long COVID?

Ultimately, I have a really simple question that I’d like to have answered.  I’m old, fat, vaccinated, and-double-boostered.  If I manage to catch the current prevalent strain of COVID-19 (BA.2.12.1), what are the odds that I’m going to end up with some debilitating long-term condition as a result of a COVID-19 infection now?

This is the knowledge gap that I’m trying to fill.  I’ve already done some round-numbers estimates for likelihood of hospitalization or death, in several prior posts.  So now I’m asking for the odds that it won’t kill me, it’ll just cripple me some.

And it turns out that, aside from the problem identified in the first section of this analysis, this research (and most of the rest that I have read) provides absolutely no guidance in answering this question.

First, near as I can tell, most of the “long COVID” cases come from self-reports of vague, non-life-threating conditions.  It’s hard to say, because studies like this one don’t even bother to show the prevalence of the various conditions that they are counting.  But elsewhere, in other long-COVID studies, you can at least find qualitative statements like the following (this is from a different long-COVID analysis):

Symptoms most commonly associated with long COVID include fatigue, headache, dyspnea, hoarse voice and myalgia

Fatigue, headache, muscle pain? Not to make too light of this, but that sounds like a typical day for me.  The point being that the serious or life-threatening conditions — heart conduction disorders, pnemoembolism, and so on — those appear to be quite rare.  Those extremely serious conditions are not the main drivers of the count of long COVID cases.

This is almost certainly why the same methodology that showed “long COVID” in 40+% of cases showed “long flu” in 30%.  As reported here:

The researchers found evidence that Covid-19 patients are more likely to suffer from long-term symptoms than flu patients, with around 42% and 30% respectively reporting at least one symptom three-to-six months after infection.

Source:  Forbes, reporting on the underlying research, emphasis mine.

There’s no way that a third of U.S. adult flu cases have resulted in long-term debilitating illness each year.  I can only conclude that much of what gets counted as symptoms of long COVID is conditions that are fairly prevalent in general U.S. adult population.

Second, as far as I can tell, all of the analysis is based on prior strains of COVID.  This CDC analysis, for example, includes loss of taste and smell as a COVID-related condition.  But, by and large, that no longer happens with the current strain BA.2.12.1, and almost never within the vaccinated population getting the current strain.

Third, little of the analysis distinguishes vaccinated from unvaccinated, and none of it distinguishes boosterered from merely vaccinated.  So, to the extent that the vaccination process offers some protection from the dangerous long-term complications of COVID, that’s not reflected in any research.

Fourth, little-to-none of it distinguishes short-term or immediately-post-infection complication, from presumed permanent organ damage.  My understanding of much of the talk about (e.g.) heart damage in young athletes post-COVID turned out to be temporary conditions that had largely disappeared within six months.


So what’s an old fat guy supposed to do?

I could go on, but at the end of the day, I just toss up my hands.  The fact is, if my worry is that I’ll get permanently disabled from a COVID-19 infection now, there is no information out there to allow me to assess my odds.

So, I have to take a guess.  And my guess is that, vaccinated-and-boostered, with BA.2.12.1, the likelihood that if I get infected with COVID-19 now, I’m going to end up with serious, permanent damage, is really, really small.  How small, I can’t say.  But I’m willing to say that the cases you read about in the newspapers are the true outliers.  I’m betting that most of them date to Delta and earlier variants, in the pre-vaccine era.  Many but not all are post-hospitalization outcomes.  And so on.

If you read this blog, you know I’m not exactly the most cheerful and upbeat person in the world.  But in terms of long COVID — despite the fear-mongering of the popular press — I’m just not going to worry about it.  I’m going to continue to take reasonable precautions against infection when I think that’s warranted.  (Because, hey, I already own a more-than-lifetime supply of N95s).  I’m not going to avoid activities for fear of long COVID.