A Precedented Miss

The media has not been kind to the pollsters in the wake of the 2020 election. Headlines have referred to the performance as a “Black Eye” or “Catastrophe” for the industry. But how bad were the polls, actually? At the time of writing this piece, the AP is reporting Joe Biden to have received 7 million more votes than President Trump. This amounts to Biden beating Trump by 4.5 percentage points in the national popular vote. On November 3rd, a weighted average of national polls showed Biden leading the popular vote by 8.4 points. At present time, national polls missed the results of the 2020 presidential election by 3.9 points. This turns out to be about average. The past two decades have seen the mean popular vote poll miss be as close to final results as 3.2 points in 2004 and as far off as 4.8 points in 2016. On average, polls have missed the previous five presidential elections by 4.0 points.

However, certain state level polls missed final results by much larger amounts. Critics are right to wonder what went wrong. This blog will give a brief overview of theories for why certain polls may have missed and highlight the unique challenges that separate political polling from other types of survey research.

Predicting a Complex Behavior

The nature of voting makes political polling a more daunting task than other types of survey research. Not only do pollsters need to capture people’s intentions for who they plan to vote for, but they also have to correctly predict who within the population is actually going to vote. Researchers could get an accurate breakdown of how the population plans to support candidates but miss the mark if estimates for who is going to actually cast their vote is off. The two-step nature of predicting an intention for a future behavior and a preference for how that behavior is enacted provides more opportunity for error.

Getting a snapshot for candidate preference is difficult in itself. Having to layer on a prediction for future behavior (actually turning out or mailing in a ballot) over those preferences makes this a harder problem. Even in the current polarized political environment there are all kinds of internal and external processes that could cause a person to change their candidate preference or their intention/ability to cast a vote by election day. If these causes systematically favor one candidate over another, the polls will be off.

The Observer Effect

A well-known example of the observer effect in physics is the process of measuring a car tire’s air pressure. The mere act of using a tire gauge will let some air out of the tire, changing the very thing we had hoped to measure in the gas station parking lot on a cold day. While any kind of survey research is subject to different kinds of observer effects based on mode, political polling has a whole set of unique problems. First, one of the main audiences for polls are campaigns themselves. Campaigns observe the results of preliminary polls and invest resources accordingly. In the days running up to a presidential election, hundreds of thousands of people backed by billions of dollars work to change the results they saw in polling from the previous days. While polls can be surprisingly stable over time, we should expect these efforts to influence results in some cases.

Unlike most survey projects, the general public often sees preliminary polling themselves. Reported big leads or close races have the potential to motivate or depress political participation. Some voters might see their preferred candidate out to a large lead and decide to stay home on election day. Others might be offended to see their party performing poorly and be inspired to cast their vote. In most cases, this will insert noise into the process that does not affect the estimate of any given election. However, if campaigns can use preliminary polling in a narrative to motivate their own supporters or depress participation of their opponents, polls have the potential to change the outcome they are attempting to measure.

Selection and the Shy Trump Voter

One of the emerging narratives for polling misses in 2016 and 2020 is the hypothesis of the “Shy Trump Voter.” Indeed, polls had a much better performance in the 2018 midterm elections than they did in either 2016 or 2020 when Trump was on the ballot. Some have suggested that due to his polarizing nature, respondents are less likely to tell an interviewer that they plan to vote for Donald Trump causing polls to systematically underestimate his vote share. There is little evidence of this version of the story from experiments priming social desirability. Additionally, respondents in more private survey collection modes like self-administered online surveys have expressed about the same support for Trump as those in phone and face-to-face interviews, suggesting respondents are not shy about telling others their support for Trump.

Perhaps a more plausible theory lies in sample selection. One of the most critical elements of polling, and survey research at large, is effective sampling of the population. If the way respondents are added into the survey sample systematically excludes certain types of people, estimates will be incorrect. Polls are commonly sponsored by media and academic institutions. If Trump supporters do not trust these institutions, they may be more likely to hang up the phone when they call asking for time for participation in a poll. Indeed, in 2016, half of Trump supporters said they never trust the media to do the right thing, compared to about a quarter of Clinton supporters. In 2019, 59% of Republicans said colleges and universities had a negative effect on the country compared to 18% of Democrats. Polls could be underestimating support for Trump if his voters are systematically declining to provide their opinions when The New York Times or Quinnipiac University come calling.

What About the Whole Global Pandemic Thing?

Many polls try to use assumptions from past elections to help predict future turnout. One of the reasons polls missed in 2016 was that many in the industry discounted the turnout potential of voters with lower educational attainment. Historically high turnout among this demographic led to Trump outperforming most polling estimates in that election. While many pollsters updated their assumptions about education in their 2020 estimates, this example raises a larger question: How might the COVID-19 pandemic shape the election?

In short, the pandemic likely had massive effects in the way people voted, many of which we are just starting to understand. We do know that more than 100 million votes were cast in the early voting period, more than twice as many than in 2016. Changes in the mode of voting, concerns of catching the virus, increased unemployment, and greater family responsibilities are just some of the dynamics that may have influenced the 2020 election. The impacts of the pandemic make it much harder to assess and predict how people are going to behave. Pollsters did not have data for how Americans participate in politics during this kind of crisis. In retrospect, it is surprising the polling misses were not much larger in this environment.

One theory to help explain the 2020 polling miss centers on remote work dynamics. The Bureau of Labor Statistics recently published an article highlighting how the ability to work from home systemically varies across sector, skills, type of labor, and position. When pollsters randomly dialed phone numbers in the runup to the 2020 election, it is possible that they were systematically oversampling people who had the luxury to work from home. While weighting adjustments for education and income could mitigate this issue, 2020 samples may have still been biased towards those in management and professional positions and against those who worked in positions involving physical labor and in-person service. If the latter were less likely to vote for Biden, this could help explain him underperforming most polls.

Moving Forward

Should we trust the polls in the future? Sure. But we need to adjust our expectations for what polls can actually provide. Polls give a snapshot of our best guess for how an electorate will behave in the future. They are not destiny. Given the complexities and random noise present in human behavior, we should not be surprised when polls are off by a few points. If polls show a close race, we should not expect the candidate who is leading to win every time. We should also recognize that polling errors are not consistent from election to election. Just because we see an underestimate of vote share among some demographics in 2020 need not imply the same outcome in 2022 or 2024. Pollsters may do more harm than good if they “fight the last war” by overfitting data to past outcomes. Survey researchers should use the lessons from polling misses to think critically about sampling populations, selection issues, and weighting. Good estimates come from ensuring our samples represent the population. This is especially difficult when the latter is a moving target, as it is in polls.