Post #686: Initial look at ZIP-code data

Posted on May 10, 2020

The Commonwealth is now putting out a daily file with case counts by ZIP code.  Vienna (22180) added 8 new cases yesterday, going from 61 to 69 persons that have tested positive for COVID-19.

That said, all three Vienna ZIP codes (22180, 22181, 22182) rank near the bottom of all Northern Virginia ZIP codes in terms of cases per 10,000 residents, having 25, 15, and 20 cases per 10,000, respectively, as of 5/9/2020.   By contrast, the median for all NoVA ZIPs with 5000 population or higher was 34 cases per 10,000.

(I estimated a higher rate in yesterday’s post, not knowing that the 22180 population includes a lot of people not in the Town of Vienna.)

I’ve just started to look at it, but the first thing I wanted to see was the extent to which income was associated with risk of infection.  It’s clear that there’s just a lot of seemingly-random variation in infection rates at the ZIP code level.  That’s due both to generally small numbers, and (probably) to single-point events such as a cluster of cases in a nursing home, or within a large household.

That said, even taking a crude income measure (this is from the IRS statistics of income, 2017), it’s pretty clear that, on average, ZIP-level income and infection rate are negatively correlated (R = -0.47).

The two high outliers in the graph above are ZIP codes in Alexandria (22305) and Springfield (22150).  That got me to thinking that, even within densely-populated Northern Virginia, you might see more spread in the more urbanized locations.  And, to a degree that’s true — that what the second graph shows.

If I put those two factors into a simple linear regression, each has a statistically significant independent effect, and together they explain about a third of the ZIP-level variation in infection rates (adjusted R-squared 0.34).

My only point here is that the infection rates are not some total mystery, appearing at random.  Even within a small area like Northern Virginia, household income and population density are systematically correlated with the infection rate.  Unsurprisingly, ZIP codes where people can afford to stay home (or, as likely, have the sort of job that allows working from home), and can get around without having a lot of contact with others, tend to have systematically lower infection rates.