Post #1894A: A minor technical followup on the NY Times/Siena poll results

Posted on November 6, 2023

I’m still looking for loopholes.  Hence, three remaining questions:

  • How was the sample selected, and in particular, did it require a successful match from voter record to cell phone record?
  • What was the overall response rate?
  • How well does this benchmark with the actual 2020 results?

L2 file?

After reading the end-notes on the detailed tabulations of the NY Times/Siena College poll, my main remaining question is:  What is the L2 file?

Survey respondents were chosen (in a sophisticated-but-neutral way) from persons on the L2 file.  That file is the “universe of observations” for the survey.

Based on the U. Penn description, the L2 file contains public information on about 200M persons who recently voted.  And, about 95 million cell phone numbers.

The file itself was developed by L2.com.  Having dealt with mailing-list vendors before, I recognized much of the subsidiary information that they merged onto the publicly-available voter records.

But if that’s an accurate description —  95M cell phones, 200M voters — then roughly speaking, a bit less than half the L2 file had phone numbers attached to the voter data.

Did this survey draw from persons on the L2 file who had a phone number listed?  Or did it draw from all persons on that file.  The documentation simply says:

The survey is a response rate-adjusted stratified sample of registered voters on the L2 voter file.

I’m pretty sure they meant response-rate-adjusted, that is, they adjusted the likelihood of being sampled based on some prior estimate of likely non-response rate.

In any case, if the U. Penn description is correct, then this is a valid question to ask. Along with the obvious followup:  If it’s persons with listed cell phone, could that matching process — the process that added the cell phone number to the voter record — possibly have induced a bias?

Response rate?

The other thing not stated was the response rate.  They said that 94% of the people they called “were reached” on the phone.  Like this:

 Overall, 94 percent of respondents were reached on a cellular telephone.

But you’re left guessing as to what the actual response rate was.  At least, as far as I could tell, from the documentation cited above.  (The “reached” figure speaks more to the validity of the added phone data, than to the response rate. You can reach me, and I can say “no thanks”.)

Don’t people lie (on average) about how they voted in past elections?

That said, the big advantage this survey has is that it shows a modest win for Biden in these states, in 2020.  That is, it corresponds to the actual 2020 results.

Whatever their methodology goes, it accurately shows that Biden won the popular vote, by a small margin, in 2020.  It’s hard to say that the 2024 projection is hugely biased in some fashion, when you can see that no such bias exists for the actual 2020 results (as estimated from this poll).

Then I got to wondering:  Don’t people lie, after the fact, about having voted for the winner?

The problem is that if I Google anything near that topic, all I get is stuff about the 2020 election.  So any answer to whether or not this is material — if people tend to say they voted for the winner — will have to wait until I figure out some better way to find an answer to that.