Who will be president? Emory election forecasters call it, sort of

Nov. 1, 2012


Beverly Clark

Emory political scientist and statistician Drew Linzer has developed an election forecasting model that aggregates the state-level polls to get a forecast for the national outcome that, in the 2008 presidential race, predicted the state vote outcomes within an average of 1.4 percentage points. He's been tracking the progress of his model via his website www.votamatic.org, which features daily snapshots of state polls and his analysis of them.

He currently is a 2012-13 visiting assistant professor at the StanfordUniversity Center on Democracy, Development and the Rule of Law, and has a forthcoming publication in the Journal of the American Statistical Association that details his forecasting model. Below, Linzer explains his model, polling "noise" and the challenges of predicting the outcome of the election Nov. 6. 

The presidential race appears to bein a dead heat, with the two candidates moving up and down in the polls depending on the day and who you ask. But based on your election forecast, there's been one clear, consistent winner for months. Do you feel confident at this point incalling it for President Obama?

My model has been forecasting an Obama victory, based on a combination of favorable long-term factors (economic growth, popularity, incumbency status),and his persistent lead in the swing state polls.  However, the forecast is subject to a large amount of uncertainty. For example, it's possible that the polling data has been systematically in error, or that undecided voters will break disproportionately towards Romney, or even that some unforeseen event will arise.  Since there's still a range of perfectly plausible (if unlikely) scenarios under which Romney could win, the answer is no, I wouldn't presume to call the election for President Obama at this point.

How did you develop this forecast model? What's the margin for error, or how accurate should it be based on the statistical modeling you've done with the 2008 election?

The model builds upon a diverse set of political science theories about publicopinion, campaign dynamics and presidential election forecasting. It provides a systematic way for us to update our beliefs about the likely election outcome, using the large number of publicly available pre-election presidential polls. The model's forecasts gradually gain in accuracy over the course of the campaign, as more information becomes available. In 2008, the model was accurate to within 1.4 percent, on average, in estimating the outcome of the 50 state elections on Election Day. In the most highly competitive states, where polling is most frequent, the average error was just 0.4 percent.

How does your forecast work? How is it different from the others out there? What factors into your formula?

The forecast "works backwards" from a long-term projection about eachs tate's likely election outcome, to the current polls, essentially forming a compromise between the two.

The model is dynamic, so that over time, the forecast relies less on historical factors and more on the polls. Since most states will not have polls conducted on most days – especially early in the campaign – the model combines information across states to track the evolution of public opinion in each state. This also enables me to detect common trends in state-level voter preferences and measure the potential impact of campaign events such as the conventions and debates.

More traditional forecasting methods will issue a static prediction, typically 2-3months before the election, that doesn't adjust once new data become available.

How reliable are the state polls you're using? What do you mean by "noise" and how do you control for it?

The accuracy of the polls is the million-dollar question. All public opinion surveys – even  if they're perfectly executed – are subject to a certain amount of error due to random sampling.

A survey only contacts a small group of respondents, and those people won't always be representative of the broader population. As a result, the polls are usually going to vary by up to 3-4 percent, in either direction, around the true population proportions. We sometimes call this random error "noise." One way to reduce the noise is to interview larger samples, but this often isn't feasible.

The other option (which is what I do) is to just average the polls together – sincethe error is random, it should all cancel out to zero. The problem is if the polls are systematically more or less favorable to one of the candidates. We know that certain firms have methodological "house effects" which make their results lean Democratic or Republican. If there are a disproportionate number of Democratic- or Republican-leaning firms releasing polls, then that will affect the overall averages.

My model assumes that house effects will cancel out – just like the sampling error– but there's no way to know this until Election Day. Fortunately, in previous presidential elections, polling averages have proven to be accurate within a percent or two of the actual election outcomes.

Most of us live in states that   red or solidly blue. Swing states like Florida, Virginia and Ohio are being watched very closely and look too close to call. Are there any swing states at this point that you can say, based on the data, will for sure go   or the other?

If the polls are right, there appear to be nine states right now that could realistically go either way. Florida, Colorado and Virginia  -and-neck. North Carolina leans towards Romney; and Ohio, Iowa, New Hampshire, Nevada and Wisconsin all lean towards Obama. It would be   if any states other than these went against expectations.

Your colleague Alan Abramowitz' renowned "Time for Change" model has correctly predicted the outcome of the popular vote for the last six presidential elections. How did you incorporate "Time for Change" into your own model?

The Time-for-Change model is the starting point for all of my forecasts. It produces an estimate of Obama's national-level vote share, which I translate into a baseline guess about all 50 state-level election outcomes. I then use daily polling data to update the state forecasts in real time. Since the Time-for-Change model works at the national level, and my model works at the state level, they're not directly comparable. But my model does reduce some of the initial uncertainty in the Abramowitz model forecasts. And, by Election Day, my "forecasts" will essentially be equal to an average of the polls, which is the most accurate information anyone will have.