Post #727: Analysis of new COVID-19 Cases in NoVA, by ZIP code

Below:  Growth in cases/capita over the last two weeks (ending 6/21/2020), annualized, by ZIP code.  Larger dots = faster growth.  Only the size of the dots matters, not how closely they are spaced.

Source:  Analysis of ZIP-level COVID-19 case counts from the Virginia Department of Health.  Population data by ZIP from

This morning I decided to do a little study of recent growth in COVID-19 across Northern Virginia ZIP codes.  To cut to to the chase:

  1. There is no “leveling-out” or “catch-up” of infection rate differences across ZIP codes.  To the contrary, ZIP codes that have been hit hard up to now continue to be hit hard.  Those that were spared so far continue to be spared.
  2. High growth in cases is concentrated in lower-income areas, with only the occasional exception.  High-income areas have largely been spared.
  3. There is no relationship between population density and recent case growth.  That differs from my analysis of about a month ago, where the then-existing cases per capita were higher in the more urbanized areas of NoVA.

Details follow


Nobody ever cares about this part, so let me just grind it out.

Data sources:  Case counts are from the Virginia Department of Health.  Population with in ZIP code and ZIP code area are from  Income is from the most recent IRS statistics of income (2017, I think) and is income per return (so its not family income or individual income, it’s just the average income per personal tax return).

Annualized two-week infection rate.  I took the count of new infections in each ZIP, divided by population to get per-capita, then multiplied by 365/14 to annualize that.  In the end, this statistic shows what fraction of a ZIP code’s population would be newly infected over the course of the coming year, if the new infection rate observed for the past two weeks continued for that entire year.

Sort and average by quintile.  To generate the graph below, I sorted the data by the annualized infection rate, then divided the ZIPs into quintiles (five equal groups).  Then averaged the data by quintile.  Quintile 1 has the ZIPs with the highest annualized two-week infection rate, Quintile 5 has the lowest.  For this analysis, I dropped the ZIPs for Fort Belvior and Quantico, as they both appeared to by anomalously low, and probably do not reflect complete data reporting.  I only consider ZIPs with non-trivial population, but I think that’s not a problem in No VA.  I end up with a total of 85 ZIP codes in the analysis.


Source:  Analysis of ZIP-level COVID-19 infection counts from the Virginia Department of Health.  ZIP-level population and area from  ZIP-level income per tax return from the IRS statistics of income (2017, I think).

Everything on the chart above is normed so that the (unweighted) average for all NoVA ZIPs is 1.0.  That’s why the scale at the left has no units on it.  So, for every measure shown, the average ZIP code in NoVA has a value of 1.0.

Now let me work through the groups of bars.

The group of bars at the far left is the current two-week infection rate (annualized).  (That’s the quantity that I’ve sorted and grouped the ZIP codes by). It shows that there is a huge spread of current incidence of disease (new infections) across the region.  So this isn’t at all like (say) flu, which tends to get everybody.  This one hits some areas hard, and spares others.  The highest 20% of ZIP codes (Quintile 1) has more than seven times the current infection rate as the lowest 20% of ZIP codes (Quintile 5). 

The next group of bars shows the total number of cases, to date, in those ZIPs.  On a per-capita basis.  The point is, the ZIPs that are being hit hard now, by brand new infections (Quintile 1) tend to be the ones that have been faring badly all along.  And the ones that have few new infections happening right now (Quintile 5) are largely the ones that have been spared, to date.  So there’s no “leveling-out” of the infection rates over time.  When viewed geographically like this, The overall disparity in total infections per capita is increasing over time.

The third group of bars looks at the relationship between current infection rate and income.  And, I guess it’s no surprise, but the ZIP codes with high current infection rates (Quintile 1) tend to be lower-income.  And vice-versa.  So that the average income (IRS income per return) in the currently-hardest-hit areas (Quintile 1) is less than half that in the ZIPs with lowest recent infection rates (Quintile 5).  That’s no surprise because that’s pretty much how the cross-section looked, when I looked at it more than a month ago (Post #686).

The final set of bars looks at current infection rate and population density, and they provide a surprise:  There is no relationship.  (That also bears out in multivariate regression).  That’s a significant change from the cross-sectional snapshot that I looked at a month ago, when it was clear (by eye, and by regression analysis) that the overall cumulative cases/capita at that point were in fact concentrated in urbanized areas (Post #687).  I now wonder if that was the effect of many nursing-home-related infections early on.  Such “outbreaks” have become increasingly rare as the pandemic has progressed in NoVA.

To see the change, here’s the cross-section of cumulative cases to date (per capita) about six weeks ago (Post #687) with RED dots, compared to the current two-week new infection rate per capita, annualized, with BLUE dots (the map above).  That cluster of big red dots, right up against the DC border, does not have a matching cluster of big blue dots.  (Note that you only compare big and small within one map, not across the maps.  They are measuring two different things, on two different scales.)

Then (in red, cross-section), and now (in blue, current new infection rate).

Also, the high growth out in Sterling (the big blue dots at the top of the map) apparently had not started yet, as of my last cut at this.  So, something has changed there.

For those who want to look at the individual ZIPs, click here to download the Excel (.xls) file:

COVID-19 NOVA ZIP two-week case growth annualized

ADDENDUM:  Finally found ethnicity data, by ZIP, from US Census.  The graph below is , in effect, a fifth set of columns for the graph above.  As expected, the current infection rate by ZIP code correlates strongly with Latino ethnicity (Post #719).  ZIPs with the current highest new infection rate (Quintile 1) have about 4 times the Latino population, per capita, as ZIPs with the lowest rate.