Post #1231: Setting up a test of school opening impact using Virginia data

Posted on September 3, 2021

This is just a brief announcement of something I intend to do.

In Virginia, school districts choose their own opening dates, subject to some restrictions imposed by the state.  As a consequence, the opening days for Virginia public schools traditionally span more than a month.   Anything from early August through mid-September.  These opening dates also tend to be quite “sticky” from year to year.  In other words, school districts that traditionally open earlier or later tend to do that year after year.

This seems like a reasonably good “natural experiment” for looking for the impact of school openings on the spread of the Delta variant.  It’s not a true experiment, because individual student’s start dates aren’t randomized.  Start dates for entire school districts aren’t randomized, so I don’t think it qualifies as a “quasi-experiment” either.

It’s a pretty good natural experiment, because those start dates were essentially determined years in advance.  They can certainly be correlated with other factors, if, say, school traditions in rural and urban areas differ.  Or if adjacent counties try to stay in sync with each other.

But, presumably, any COVID-19 related conditions in a county, up to the start of schools, are what they are.  And what I’ll be looking for is a change in trend, for the school-age population, starting two weeks after the school opening.

There’s a further complication, in that Virginia releases the age x day x region COVID-19 case counts only at the level of Virginia health districts.  So I have to crosswalk school districts to Virginia health districts, and aggregate the start dates by health district.

Finally, the consolidated school calendars of all the Virginia school districts has not yet been published by the Virginia Department of Education for the 2021-2022 school year.  So I’m working from the dates on last year’s calendars.  That said, all I’m doing is assigning Virginia health districts to early, middle, and late typical school start dates.  The early and late date districts start schools more than two weeks apart, on average.

The statistical power of an analysis of this sort depends on how the data behave in a time-series.  If the share of COVID-19 cases in school-age children is rock-solid-steady prior to the opening of school, even a small impact of school openings will be readily apparent.  By contrast, if it’s jumping all over the place prior to the start of school, even a large impact will get lost in all that random time-series “noise”.

(I should note that most statisticians do their analysis of natural experiments incorrectly, using cross-sectional variation as the basis for their statistical tests.  That’s not right.  I don’t much care if schoolchildren’s share of cases varies from county to county.  What I care about is how it varies from day to day.  If it’s really steady, and then jumps up at the right time, I’m pretty sure I’ve captured the impact of school opening, regardless of how that share varies across regions on any given day.)

This analysis is something of a rough cut, in that my aggregation of counties into health districts is un-weighted (each county counts the same).  But in most cases, it looks like it will assign the health districts to the right group of early-middle-late school openings.  For one thing, really populous areas form their own health districts.  So for the bulk of Virginia’s cases, that issue of sloppy aggregate of individual school districts does not even arise.

Separately, I can also check the “outbreak” statistics captured by Virginia.  Any time you get multiple cases, at the same time, within a school, that should be separately flagged in Virginia’s data.  But “outbreaks” are a lot “grainier” than the case counts — they come in big, discrete lumps.  So I think I’ll have better statistical power if I stick with case counts.

Anyway, owing to the generally late opening of Virginia schools, this is just the warm-up exercise.  I’m setting up the programming and checking the stability of the numbers, so I can revisit this in a couple of weeks.

Here’s the preliminary result.

Source:  Calculated from Virginia Department of Health COVID-19 case counts by age, date, and health division, and 2020-2021 (sic) school district schedules from the Virginia Department of Education.

I’m not sure what to make of that.  I had hoped to see three parallel lines.  Instead, the health districts containing early-opening school systems already show an increase in the fraction of cases among individuals under age 20.

But, in fact, almost half the school districts, within the health districts where typical school opening date is early, should have opened early enough that school-spread cases should have worked their way into the system by 9/2/2021.  That is, it’s vaguely plausible that the rise of the blue line is an actual impact of school reopening.

So I’m not going to dismiss this analysis quite yet.  I’ll come back in a couple of weeks to see if the other two curves pull upward, in sequence, as the school year openings progress across the state.