Post #849: A must-read on patterns of COVID-19 spread

Posted on October 5, 2020

I rarely just ditto a published article on this blog, but this one, in The Atlantic, is the most sophisticated discussion of COVID-19 spread and contact tracing that I’ve seen to date.

Briefly, this Atlantic article is about “k” or “dispersion”, the measure of how lumpy or “clustered” the spread of COVID-19 is.  The fact that most of the spread isn’t due to one-person-at-a-time spread.  Most  spread is due to “clusters”, where one person infects many people, all at the same time.  And to the point, it’s about what that should imply for everything from contact tracing, to how the government goes about trying to bring the pandemic under control.

It’s also incredibly timely, because the large cluster of cases arising around or during the recent White House Rose Garden ceremony is normal for COVID-19 spread.  Within that group of top Republican supporters, we did NOT see one person a day showing up as infected, each day, over the course of a dozen days.  As if each person had passed it along, one at a time.  That would have matched the stylized pattern for (e.g.) seasonal flu.  Instead, we saw the 15-and-counting individuals (so far) showing up all at once, all apparently infected at more-or-less the same time, with one or a few closely-related events.

We’ll never know the actual count of people ultimately infected via that event, because the White House won’t allow anyone else to do contact tracing, and will not do any contact tracing itself.  Just another way in which the Republican leadership expresses its contempt for the CDC guidance on containing this disease (Post #848).

The title of that Atlantic article reads like a typical piece of clickbait.  (“Use this one trick to lose tons of belly fat!”.)  And it’s tough going in spots.  But for me, at least, it was well worth the time it took to read it through.

The piece is about how some countries “get it”, with respect to the prevalence of COVID-19 clusters, and have modified their approach to the pandemic accordingly.

But not the U.S.  And because the U.S. response hadn’t really taken clusters into account as the main mechanism of disease spread, our response remains something of a … well, I’ll let you fill in the blank there.

Background:  What the average informed person already knew.  And a little bit more.

Source:  Clustering and superspreading potential of SARS-CoV-2 infections in Hong Kong, Dillon C. Adam, Peng Wu, Jessica Y. Wong, Eric H. Y. Lau, Tim K. Tsang, Simon Cauchemez, Gabriel M. Leung and Benjamin J. Cowling, Nature Medicine Letters,

By now, everyone has heard about the “R” value of COVID-19.  That’s the average rate at which any one infected person passes on that infection.

You’ll hear it said that when R > 1, that means the pandemic is growing.  Which, when you think about it, should make you say  “duh”.  R > 1 means you have more cases today than you had yesterday.  The whole R > 1 thing is more-or-less just a definition.  R > 1 is just the definition of “its spreading” and/or “case counts are rising”.  (There’s a bit more to it, but that’s the gist).

So R is straightforward.  The higher the value, the larger the average number of people that an infected person infects.  R > 1 means the number of infected cases is growing.   R < 1 means that it’s shrinking.

But there’s also the variance around that mean rate of infection.  To borrow an example from the Atlantic article cited above, when Bill Gates walks into a bar with 99 people in it, the average wealth per person is about $1B.  But there’s a high variance around that mean.  If Gates then gives each person $1B, it’s the same mean wealth per person, but zero variance.

Or, more to the point, look at the diagram above.  This is a diagram of how COVID-19 was spread initially in Hong Kong.  See the big blob on the left?  Almost half the total spread, for the entire country at that time, was attributed to one large cluster, in one bar, and then the cases that radiated outward from that.  Then a couple of smaller clusters (two middle groups).  And then a number of cases with simple person-to-person spread (right).

That’s the idea here.  There’s a high variance in the number of cases that any one infected person then goes on to infect.  Could be anywhere from zero to one or two (portion at the right), to maybe 20 or 30 at one sitting (the big blob on the left).

In this case, epidemiologists use “K” or “k” to represent a numerical estimate of that variation.  The term of art is “dispersion”, in the sense that if you plotted the data showing the number of persons that each individual went on to infect, you’d see values varied around the mean.  They’d be dispersed, and not tightly concentrated around the mean.  If you observe a particularly large amount of variation, that’s termed overdispersion.

(Edit:  Technically, dispersion refers to the shape of the mathematical distribution used to model these values (a negative binomial distribution) relative to a simpler, standard distribution known as a Poisson distribution.  I don’t think that fact helps anyone understand this, hence my less precise description above.)

What do I mean by that?  In practical terms:

Only about one infected person in seven ever spreads COVID-19 at all.  Six of out seven who get infected are dead ends, as far as the virus is concerned.  They get infected, but the don’t infect anyone else.  But that seventh person?  Typically, that seventh person infects a large number of individuals.  A round-number estimate is that five percent of infected people cause 80 percent of all infections.

This makes the spread of COVID-19 quite different from, say, the seasonal flu.  With flu, it’s typically closer to one-to-one or one-to-a-few spread.  Somebody give is to you, you pass it on to somebody else.  And repeat.  Flu mainly spreads in a large number of one-to-one transmission events.  Flu rarely spread via mass “superspreader” events.

But with COVID-19, it’s more like one person gives it to a cluster of eight others.  Of the cluster, seven don’t spread it.  But one person in that cluster goes on to give it to eight more people.  And repeat.   COVID-19 mainly spreads via a small number of one-to-many transmission events.  Covid-19 frequently spreads via mass “superspreader” events.

That pattern is typical for COVID-19.  That’s what the dispersion “k” value for this disease captures.  It reflects wide variation in the number of individuals that any one infected person goes in to infect. Most infect nobody.  Some infect a large number.

In a broad sense, the only odd thing to remember about “k” is that its the inverse of the variation.  A low “k” means a lot of variation in the number of persons infected.  Some people infect a lot of others, some people infect nobody.  A high “k” means little variation in the rate of infection.  Most infected people go on to infect roughly the same number of additional people.

I should mention that it’s fairly hard estimate “k”, and that estimates of “k” for a given disease will vary, for a number of good reasons.  First, direct estimation of “k” depends on contact tracing.  You (e.g.) have to know how often an infected person attended an event and infected N others. That, by itself, is hard to do.  (But, as I read the literature, you can also indirectly infer it from curve-fitting to the aggregate progress of the epidemic, but I have no clear understanding of how that’s done.)  And, of course, the value you get will depend on behavior.  A state that shuts down large gatherings is going to show a higher “k” value (lower variability in disease transmission) for COVID-19 than a state that does not, all other things equal.  Limiting the gathering size limits the number of persons potentially infected in a super-spreader event.  It prevents extreme “outlier” events from occurring.

And I have a hunch — but have not been able to verify — that diseases with known aerosol spread tend to have high dispersion (low “k” value).  So far, that’s been my empirical observation, reading through the literature.  Taking two diseases known for aerosol spread — measles and tuberculosis — I found estimated “k” values far below 1.0.  For example, the estimated k value for measles (in one California outbreak) seems to be around 0.27 (per this reference).  The estimated k for tuberculosis (an outbreak in Australia) is around .16 (per this reference).

(But note that the reverse is NOT true.  You can have non-aerosol diseases with low k/high dispersion in the spread.  For example, a gonorrhea outbreak studied in this reference was estimated to have a relatively low k/high dispersion factor of 0.257, presumably owing to a small fraction of those infected having a large number of sexual partners.)

So my unproven guess is that all diseases with aerosol (airborne) spread have low k.  (High dispersion/significant numbers of super-spreader events.)  But that not all low-k diseases are aerosol spread.

For reference, various estimates of the “k” value for COVID-19 range from about 0.1 to maybe as much as 0.5, based on the particular study.  So, COVID-19 cases cluster up to about the same extent that those two known-to-be-aerosol-spread diseases cluster, measles and tuberculosis (TB).

(As a side note, TB is literally the reason that hospital personnel have to wear N95 masks when treating COVID-19 cases.  The protocols requiring use of N95s were developed for dealing with TB cases in the hospital, and were then subsequently required for any disease suspected of being capable of airborne transmission.)

Two policy implications from the Atlantic article.

These are the two things that stand out most vividly, for me, when I think back on the content of that article.

We have a public health system and contract tracing methods that are largely geared toward what we normally see in the US:  Flu.  Advice is geared largely toward preventing one-at-a-time, person-to-person transmission.  It’s largely based on diseases that don’t routinely spread via aerosols.  It’s geared toward high-k/low dispersion diseases.

And, to the extent that contact tracing is done, it focuses on “forward” tracing, that is, finding an individual, and finding all the persons that individual may have infected.  Even if you look back in time, you start from the known infected individual, and look “forward” from there to find all the persons he or she may have infected.

Turns out, per the Atlantic article, much of that is just not-very-well thought out, in the context of an aerosol-spreading, low k/high dispersion disease like COVID-19.

And, I guess it should come as no surprise at this point, other countries have figured that out.  And other countries are using public health methods that, in all likelihood, work better than what we are doing.

So, as with “pooled testing” being adopted immediately by the Germans.  But hardly ever used in the US.  (Post #761). Or the Chinese and Russians making rational early use of their vaccines before they are fully tested.  But not the US.  (Post #814).  This seems like yet another another failure of our public health leaders to be able to grasp the basic mathematics of what they are dealing with.  And, instead, putting their heads down and just doing what they have always done.

Point #1, obvious:  If most of the spread consists of a series of big clusters, and very little of the spread is simple one-at-a-time person-to-person spread, then orient your public health advice and efforts toward preventing those clusters.  This, apparently, is exactly what Japan has done.  And while we have some aspects of that, such as limiting the size of public gatherings to N or fewer people, that really has not been the focus of our efforts.  It should be noted that Japan has been able to weather this mostly without shutting down, mostly by focusing on preventing those super-spreader “cluster” events.

From the Atlantic article (emphasis mine):

"Oshitani told me that in Japan, they had noticed the overdispersion characteristics of COVID-19 as early as February, and thus created a strategy focusing mostly on cluster-busting, which tries to prevent one cluster from igniting another. Oshitani said he believes that “the chain of transmission cannot be sustained without a chain of clusters or a megacluster.” Japan thus carried out a cluster-busting approach, including undertaking aggressive backward tracing to uncover clusters. Japan also focused on ventilation, counseling its population to avoid places where the three C’s come together—crowds in closed spaces in close contact, especially if there’s talking or singing—bringing together the science of overdispersion with the recognition of airborne aerosol transmission, as well as presymptomatic and asymptomatic transmission.

And so, all the time I’ve been featuring this little poster from the Japan Ministry of Health, little did I know that, ultimately, it was based on their recognition of the high dispersion of COVID-19 transmission.  That is, the prevalence of large clusters of cases as a principal means of spread.  I’ve been plugging this since June 2020, and I never made the link to the importance of clusters, for COVID-19 transmission, until I read that Atlantic article.


Point #2, you need a little math to under stand:  “Backward” contact tracing will be far more fruitful at finding additional cases than “forward” contact tracing, as done in the US.  In other countries, such as Japan, when they identify a case, they don’t just do “forward” tracing to see whom that person may have infected.  They also do “backward” tracing, to try to find out who infected that person.  And then, finding that, go forward from there.

From The Atlantic (cited above), emphasis mine:

Because of overdispersion, most people will have been infected by someone who also infected other people, because only a small percentage of people infect many at a time, whereas most infect zero or maybe one person. As Adam Kucharski, an epidemiologist and the author of the book The Rules of Contagion, explained to me, if we can use retrospective contact tracing to find the person who infected our patient, and then trace the forward contacts of the infecting person, we are generally going to find a lot more cases compared with forward-tracing contacts of the infected patient, which will merely identify potential exposures, many of which will not happen anyway, because most transmission chains die out on their own.

Honestly, just give it a read.  Among other things, it explains why cheap, rapid not-too-accurate tests are more useful than slow, expensive, accurate (PCR) tests when the disease is characterized by large clusters.  Ditto for testing sewage as a way of finding outbreaks.  It goes a long way toward explaining why the spread of the disease looks so haphazard (something I have noted in posts on this website).  Best article on COVID-19 policy that I have read in a long time.  Possibly ever.