Those of you who followed these posts during the nomination campaign will remember my constant refrain that not all surveys are alike and my determination to make you look at their sampling techniques in order to evaluate their accuracy. We should have the same concerns when looking at polling numbers in the general election. For much of the summer, most polling organizations simply polled registered voters. This is appropriate for understanding the potential outcome of a presidential race. But in trying to forecast the likely outcome, many pollsters believe it makes little sense to rely on a random sample of all eligible voters, or even registered voters, since we know that only 6 or so out of every 10 such voters (or even less) will actually cast a presidential ballot (and yes, I’m not one of them). As a result, at this stage of the race, most of the national polling organizations switch their polling and begin surveying likely voters. They want to randomly sample from that subset of voters who are actually going to vote, as opposed to simply being eligible to vote. As Garrett Saito suggested in an email to me, however, that raises an important question: how do they determine the likely voters? Garrett, and others, have suggested that it is possible these polling organizations might be underestimating Obama’s support. Why might this be?
Some of you might recall that earlier this year in June Gallup ran two polls – one of likely voters, and one of registered voters. Obama led McCain among registered voters, 47% to 44%, but McCain led among likely voters, 49-44%. And, in fact, historically (at least as far back as we have polling data), Republicans have been more likely to be included in likely voter models despite usually lagging behind Democrats in terms of registered voters. In the actual presidential elections, the Republican candidate has often won despite the fact that there are usually more registered Democrats which appears to validate this sampling approach. So Gallup and other organizations typically oversample in their likely voter model from Republicans, relative to their numbers among registered voters. This is usually appropriate, but the key question is whether that dynamic still holds true in this election; is the traditional means by which pollsters identify likely voters still valid, or are there reasons to believe that the prevailing methodology needs to be reassessed? Some observers suggests that because Obama is attracting such strong support among younger people and others who have never voted before, the likely voter samples might be overestimating McCain supporters, and underestimating Obama voters. In a close election, the argument goes, these newly registered voters might swing the election to Obama, but the polls will miss this. As evidence, they point to party registration figures in a number of states which indicate that enthusiasm for the Democratic ticket is much higher than in past elections.
Is there any evidence to support this argument?
To answer that question, we need to understand how polling organizations determine likely voters. Unfortunately, each polling organization has their own method for determining likely voters and not all of them reveal how they do so. (If you are interested in looking at this topic in more detail, here’s a link to a discussion from 2004 by polling expert Mark Blumenthal which is the one I rely on most frequently. Keep in mind, however, that some organizations may have changed their methodology since then):
In examining how polling organizations determine likely voters and whether they are underestimating Obama’s support, I look at the following issues.
1. Does the polling organization use previous voting as one of their indicators of a likely voter? This could potentially lead to underestimating Obama’s support if most of these newly registered voters are likely to vote Democrat. (In 2004 survey organizations using previous voting as part of their likely voter screen included ABC Washington Post, AP-IPSOS, ARG, CBS/New York Times, Democracy Corps, FOX/Opinion Dynamics, Gallup, Harris, LA Times, Newsweek, Pew, Quinnipiac, Rasmussen and Time.)
2. Does the polling organization weight its sample of likely voters by party – that is, does it assume a priori that a certain percentage of voters will be Republican, Democratic, Independent, etc., and adjust its survey results accordingly? If so, and this weighting underestimates the percentage of Democrats who are likely to vote, then again the poll could underestimate Obama’s support, assuming most Democrats vote for Obama.
3. Does the polling organization attempt to calibrate their likely voters sample according to expected turnout? For example, Gallup historically chooses an expected cutoff figure for turnout – say 60% – and uses that to adjust its likely voter model. Again, if turnout is much higher due to an influx of Obama supporters, that could skew the likely voter model. In 2004 seven survey organizations – ABC/Washington Post, Gallup, LA Times, Newsweek, Pew, Quinnipiac and Time – used this method.
So, what does this mean for polling results in this election cycle? Is there reason to believe the polls are systematically underestimating Obama’s support? There is no single answer to this question, in part because each polling outfit uses slightly different methods for determining a likely voter. Certainly there is the potential for bias, and the polling organizations are well aware of this.
One way to counter potential bias is not to rely on any single methodology for determining likely voters. In previous posts I have cautioned about relying on the RCP average of the polls because it is rolling average that includes polls from different time periods in its estimate, and so can be slow to react to trends. On the flip side, however, by averaging polls using different methodologies, it is less likely to be biased toward any one method for determining the likely voter and hence is less likely to be biased against Obama. So the RCP poll may be less biased in terms of likely voter models.
More generally, keep the following points in mind when considering the issue of bias in the polls. Most importantly, we know, based on past elections, that already 80% of voters have likely already determined for whom they will vote in the 2008 election. From this perspective, the ups and downs in tracking polls can be understood as partly statistical noise reflecting errors in estimation, and partly a measure of which candidate, in voters’ minds, is getting the better of the argument or media coverage, at the time. These variations aren’t necessarily capturing changes in how people are planning to vote. The exception is that subset of voters – particularly independents – who tend to make up their mind very late in the election cycle. Historically, likely voter models becomes more accurate as the campaign winds down, for the simple reasons that as more people become interested in the election, it is easier to choose a likely voting sample. Second, every four years we hear about how younger voters will make a difference in the election, and every four years they don’t. In 2004, voting among 18-29 year olds was up, but the increase was not as much as among older voters, and Bush benefited by the overall increase in turnout more than did Kerry, and expanded his support from 2000. So, until voting patterns change, it is not unreasonable for pollsters to rely on models that have worked well in previous elections. Third, while it is true that enrollments are up disproportionately among Democrats in many states, we have to see if the Sarah Palin factor begins to counteract this – it may be that likely voter models are underestimating her impact as well. In particular, if Palin’s support is drawn disproportionately from the bitter, religious-leaning, gun-toting blue collar workers who supported Clinton, then likely voter models based on party could be skewed against the Republicans.
The bottom line is that this is a precedent breaking election. This means that at least some likely voter models will be off this time around. However, until we have conclusive evidence that voting patterns are systematically different, it is difficult to say which models are wrong or why. At best, then, we need to read these polls results with caution and avoid relying on any single result. Pay attention to the fine print describing how they determine a likely voter. Taken collectively, however, these polls remain the best source of data we have regarding voter opinion during the campaign. And – at this point – I don’t see any evidence that they are systematically underestimating Obama’s support. But we can’t be sure of this. What we can do, however, is compare polls and understand why they differ – it almost always has to do with how they construct their samples.
Having spent all this time discussing polling data, I am now going to explain why political scientists don’t need any of it to predict who will win the 2008 presidential election. In fact, their predictions are already in!
A final thought: I have received some excellent comments from many of you in my email inbox. In particular, many of you have taken issue with some of my observations, or provided your own election analysis. Everyone would benefit from these comments, and so I urge you to post them on my blog (you need not attach your name). We all can benefit from a broader discussion of the issues that I am raising. So join in!