Monthly Archives: September 2008

Color changes: Great for leaf peeping, not always for election maps

One benefit that Al Gore bestowed on all of us when he invented the internet is the ease with which we can access a plethora of election-related information at the click of a mouse. This includes color-coded maps of the United States that try to estimate the Electoral College results based on current polling data. There are several of these maps floating around the internet (I’ve referenced one on my blog sidebar.)  Although each of them employs a slightly different formula for averaging polling data, the underlying premise is the same:  some states are colored as solidly in either McCain’s or Obama’s column, some are shaded to indicate they are leaning to one candidate or the other, and some are colored as “toss up” states where it is too close to call.  Avid consumers of election news now eagerly await the “flipping” of a state from one color to the next, and when their candidate gains a state, the blogosphere is filled with tidings of great joy.

Not surprisingly, I want to add a word of caution about overreacting to changes in the color of particular states. The colors are based on polling averages, and as you know by now, each poll has a margin of error which simply indicates how precise the poll is.  Most reputable polls have a margin of error of 2-3% at the 95% confidence level.  All that means is that the pollster is telling you that she estimates that the real population figure (say, the percent of all voters supporting Obama) is likely to be within 2-3% of the reported figure in the poll 95 out of 100 times.

Now suppose you see a state on a map “flip” colors from being solidly McCain to leaning McCain.   Has something substantial happened?  Not necessarily.  If the state was “solid” McCain by the barest of margins (most of these maps require you to be ahead in the polls by more than the margin of error to be solidly in one camp or the other), and one poll comes in that has the race tied, that might be enough – depending on how the color coding is calculated – to move the state into the leaning category.  But because that poll has a margin of error of 3%, or even more, we can’t actually be sure anything fundamental has changed even thought the color has changed.

Bottom Line: Changes in color are wonderful for leaf peeping, but beware of reading too much into changes on an electoral map.  Read the fine print underlying the change before concluding that something fundamental is happening. In fact, as I’ll discuss in a later post, what is remarkable about this race so far is just how little change there has been in the support for the two candidates since July.

We get comments, lots and lots of comments (cue Letterman music …)

In this post I want to try to answer several of the excellent questions posted in response to some of my earlier blogs.

First, Conor Shaw points out that although my rule of thumb that McCain does better in surveys of likely voters, while Obama does better in polls of registered voters, may hold at the national level, the reverse seems to be the case in the key battleground state of Virginia. Recent polls there show Obama doing better in surveys of likely voters than of registered voters.  It is difficult to explain why Virginia is polling differently because it is one of the more than 20 states that does not keep party registration figures. But my guess is that the answer lies in the proportion of Democrats, especially African-Americans,  that are being sampled in the likely voter surveys compared to surveys of registered voters.  For example, SurveyUSA included 33% Democrats in their latest poll of likely voters compared to 38% Republicans and 22% Independents, and had Obama beating McCain by 6%, 51-45 in Virginia.  That may be a higher proportion of Democrats than is included in the polls of registered voters. Similarly, a PPP poll that had Obama up by 4% included 21% African-Americans,  but a Newport poll in Virginia that had McCain up by 9% only sampled 10% African Americans.  According to exit polls (which may not be completely accurate), African-Americans composed about 22% of the Virginia voters in the 2004 election. Again, this just reinforces my earlier point regarding the need to check the pollsters’ internal weighting for clues to explain different survey outcomes.  With registration running high this election cycle, it is often a guessing game to determine just how to weight the different demographic groups. Keep in mind that Bush won Virginia  by 262,217 votes in 2004.  But as of September 8th of this year, more than 285,000 people have been added to the registration rolls since that election. How many are Democrats?  We can’t tell, but the Obama campaign believes that the demographics of the newly registered voters, particular the high number of registrants under age 40, favor him. However, Brian Schaffner at suggests that based on where the registration is taking place, it’s not clear how many of the newly registered voters are in fact likely to vote for Obama, since the majority of the new voters registered in areas that went for Bush in 2004.  Keep in mind as well that in a state where more than 3 million people cast votes in 2004, Obama can’t hope to win based on newly registered voters alone; he will need to peel off some of the Bush coalition as well.

Both Bhima and Polemarchus are interested in the track record of the forecast models in previous elections. I will try to present some summary statistics for those models that have been in use for several elections.  The questions highlight an interesting problem with some of these models, however: some forecasters tweak their models after the fact to make them appear to “fit” with past elections.  In effect, then, rather than test assumptions about what makes voters behave as they do, these forecasters instead adjust the numbers to make it appear retrospectively that the models predicted past elections better than they did at the time. These types of adjustments don’t necessarily reflect an understanding of the forces that account for elections outcomes, so we need to be careful when judging a forecast model based on how well it predicts all past elections.

Polemarchus also raises a point cited by several others: if we can predict elections before the general election campaigns begin, why even bother campaigning?  This is not a facetious question.  Let’s be clear here: the models don’t assume that campaigns don’t matter.  In fact, campaigns do matter in at least two important respects.  First, most forecast models assume that both major party campaigns will be effective at framing the voting context in ways that their likely voter coalition will find most appealing. As long as the modelers can measure “reality”, then, they should be able to account for these campaign framing effects.  Second, campaigns help mobilize voters.  Indeed, one of the big questions that may throw the forecast models off this year is turnout.  A huge turnout, particularly among one party’s voting coalition, can undermine the assumptions built into the forecast model.  So we need to keep an eye on turnout levels.  Here’s the turnout among eligible and registered voters for presidential elections dating back to the 1940’s, pasted from Michael McDonald’s valuable election site:

Note that after an almost uniform decline since 1964, turnout of eligible voters (and of the voting age public) has gone up during the last two elections, topping 60% in 2004.  My expectation is that this election will continue that trend, with turnout above the 2004 level.  How will that affect the forecast models?  I’ll try to address that in a future post.

Keep those questions coming!


Assessing the forecast models: Some caveats

Yesterday I provided the results from several forecast models, all of which – with one exception – predict that Obama will win the two-party popular vote over McCain. Historically, of course, the popular vote winner is usually the Electoral College winner. (In a later post I’ll examine the likelihood that this won’t be the case this year.)  On average, the models have McCain winning only about 48% of the popular vote.  These forecasts, I suggested, have a strong track record.  But they are not infallible.  Although they all (with one exception) correctly picked Bush as the popular vote winner in 2004, most of them were wrong in 2000. “It’s not even going to be close,” Michael Lewis-Beck said then, predicting that Al Gore would win 56.2% of the 2000 two-party popular vote. James Campbell had Gore winning with 52.8% .The most conservative model, by Alan Abramowitz, had Gore winning 53% of the vote. Thomas Holbrook gave Gore 59.6% of the vote. Christopher Wleizen had Gore receiving 56.1% of the 2-party vote. (These names should be familiar to you since I’ve included their 2008 forecasts in my last post).  As you know, Gore did win the popular vote, but came nowhere close to the percentages most models predicted; he actually received 48.4% of the popular vote with Bush coming in a close second with 47.9%.  (In case you are wondering, I also predicted Gore to win the 2000 race, but with 49.5% of the popular vote. Of course, I never envisioned that he would lose the Electoral College).  In retrospect, the forecasters who missed the 2000 election had a number of explanations for why their models were wrong, but a big chunk of the blame, they argued, is that Gore simply failed to capitalize on the positive fundamentals – a budget surplus and relatively strong economy (although an economic slowdown was just beginning) – that should have kept the incumbent party in power.  Instead, he ran a populist-based campaign that emphasized change while trying to separate himself from Clinton’s legacy. Now is not time to debate the Gore campaign strategy from eight years back.  The more important point is that these forecast models sometimes miss the mark.  This is a reminder that these models are predicated on some basic assumptions that are worth reviewing:

First, they presume a two-candidate race.  In 1992, Ross Perot jumped into the race as a third-party candidate, winning almost 19% of the popular vote and perhaps costing the incumbent George H. W. Bush reelection. Bill Clinton was elected instead, with 43% of the vote, while Bush won only 37.4%.   However, I don’t see a strong third-party candidate running in 2008, although in a close election it is possible that a Bob Barr (the Libertarian candidate) or perennial candidate Ralph Nader might influence the outcome of a particular state, as Nader did in Florida in 2000.

Second, the models don’t predict the Electoral College vote.  In later posts I will begin examining the likely outcomes in key battleground states that will determine who wins the Electoral College. Forecast models are predicated on the assumption that whoever wins the popular vote will win the Electoral College vote as well. This is usually a safe assumption – but not always!

Third, and perhaps most importantly, the models assume that each candidate runs an equally effective campaign.  That is, the two candidates understand how best to frame reality in ways that redound to their comparative advantage, and do so effectively.  We don’t expect McCain to run on his youth, or Obama to tout his war record – that would be at odds with what the public knows about these candidates. Instead, they try to emphasize their strengths and tout their opponent’s weaknesses .

Finally, the models expect that voters will behave in this election much as they have in past elections: using a combination of retrospective evaluation of the party in power and prospective assessments of what is likely to happen if a new party is put in the Oval Office, and they vote according to which scenario is best for the nation, broadly speaking.

Now, is there anything unique about this election that might cast doubt on some of these assumptions?   At first glance, I can think of several aspects of this election that might throw these models off.  They are, in no particular order: race, gender, the lack of an incumbent in the race; two senators heading the major parties’ tickets, the impact of the internet/blogosphere on election coverage and the proliferation of polls.  Any one of these factors could, potentially, disrupt the “normal” vote in a way that might cast doubt on the assumptions built into these forecast models.  But will they?  In many respects this has been an unprecedented election.  In other ways, however, it actually has reinforced political scientists’ basic understanding of the forces that drive most presidential elections. In my next several posts I will examine some of these issues to see if we can estimate what impact, if any, they will have on the general election. Because of the debate scheduled for Friday, I want to begin by discussing the role past debates play in influencing election outcomes.

But what do you think?  Which of these factors, if any, might affect how voters cast their ballot in 2008?  Are there other factors that might come into play in this election?

It’s Obama! (Or is it?): Forecasting the 2008 election

Who will win the 2008 presidential election? For those political scientists who specialize in modeling elections with the goal of predicting outcomes, the verdict is already in. Barring some unforeseen or atypical event(s), they have already predicted the winner. Who is it? Before providing the answer, let me first explain the process by which political scientists construct their forecast models.

For years I have told my students that if there is any aspect of political science that can truly be described as a science, it is forecasting presidential elections. Presidency scholars can, with surprising accuracy, predict the winner of the two-party popular vote, within 2-3%, in a presidential election several months before the election takes place. Of course, they aren’t always correct. But they predict the winner far more often than not. In 2004, for example, 7 of the 8 forecast models that I consulted prior to Labor Day predicted George Bush’s reelection that year with an average popular vote of about 53%. He actually won with about 51%.

How do they do it? How can political scientists predict elections before any of the events that the media emphasizes – campaign advertising, debates, and the other myriad daily incidents on the campaign trail – have occurred? They begin with a basic premise: that the voting public, in the aggregate, is rational. That is, voters can differentiate the two candidates, broadly speaking, along an ideological continuum, and generally vote for the candidate who falls closest to their own political views. Moreover, voters are aware of how things are going in the world, based on tangible measures – gas prices, food costs – and by paying at least limited attention to the news and through conversations with family, friends, and coworkers. As political scientists, then, if we can create measures for “reality”, and for voters’ preexisting political preferences, we can construct a reliable forecast model. That’s the basic approach adopted by all these forecasters, although their models differ in the particulars.

There are a variety of forecast models, then, but they all are based on some variation of the following approach. First, establish some measure for the state of the economy, such as the unemployment rate, or the quarterly change in the Gross National Product, since most voters pay attention to the economic health of the nation and often vote the nation’s “pocketbook”. Second, measure the partisan distribution of voters in country – how many people call themselves Democrats? Republicans? This is crucial to forecasting, because party affiliation is the single bigger predictor of the vote in a presidential election. Most voters tend to view politics through a partisan prism that colors their evaluation of candidates and issues, and they typically support the candidate who shares their party preference. Democrats may not know the details of Obama’s health plan, but they know he’s a Democrat, and that tells them they are likely to favor his plan over McCain’s. So if forecasters know the partisan breakdown of the likely voters, it helps establish a baseline for predicting the presidential vote.

All forecast models are predicated on an expected level of turnout among voters. Some also include additional variables that measure how long one party has occupied the presidency, or the popular approval of the current president, in order to assess the pressure for “change” in Washington.

In sum, the typical forecast model predicts the distribution of the two-party vote as a function of economic conditions, the partisan breakdown of voters, approval ratings of the party or president in the White House, projected turnout, and sometimes, length of time the incumbent party has held office.

Note what these models seem to ignore: campaign advertising, debates, the daily tactical skirmishing between candidates that forms the gist of campaign reporting – indeed, almost the entire general election campaign! And yet they have accurately, within a specified margin of error, predicted the victor in almost every presidential election dating back to 1980. (I will deal with the “almost” in a separate post). Forecasters ignore the daily minutia of campaigns because, for the most part, they believe that fundamentals drive the vote, not campaign tactics. Both candidates will try to “frame” the campaign in a way that is most advantageous to them. For Obama, that means characterizing this election as a choice between change and four more years of the “failed” Bush-Cheney policies. For McCain, it is also about change, but change driven by a maverick who has the experience to do what is necessary to guide the nation in perilous times. For forecasters, these efforts largely negate one another; few voters are persuaded by the campaign advertising and debates to abandon their long-standing predisposition to vote for the candidate of their preferred party. Indeed, polls show that already 80% of the electorate has made up their mind regarding which candidate they will support this year.

In short, the models work because voters are rational. They don’t make a blind leap of faith in the voting booth. Instead, they base their vote on a retrospective evaluation of how things are going under the current party in power and a prospective estimate of how things are likely to go if they change parties or not. Each candidate, meanwhile, will do his best to frame the campaign using the raw material the political environment provides, but they can’t simply create a reality that is at odds with what the voters see. As a result, campaigns may be more or less effective in mobilizing voters to come out on Election Day, but they tend to have very little persuasive effect regarding how voters cast their ballot.

But what about the undecided voters – those 15-20% of the electorate, most of whom are independents, that don’t decide for whom to vote until the last two weeks? Forecasters assume that their vote will break down roughly according to the distribution of the more partisan voters, so that these late voters typically do not change the outcome of the race.

This is a very simplified and abbreviated explanation for how presidency scholars construct their forecast models. You will undoubtedly question many of these assumptions – as you should. Rather than try to anticipate all your objections, let me instead tell you what the forecasters are predicting for 2008. Then I will try to respond to the inevitable barrage of criticisms. Please post your comments on my website if possible, since I anticipate a rather vibrant discussion on this topic. And now, without further ado, here are the forecasts for this election. I list the date of the forecast (notice most are made prior to the party conventions), the name of the forecaster, and the percentage of the two-party vote that they predict the Republican candidate will receive. Drum roll, please:

09/08/2008 Jim Campbell 52.7

09/03/2008 Andreas Graefe and Scott Armstrong    48.0

08/28/2008 Michael Lewis-Beck and Charles Tien 49.9

09/05/2008 Tom Holbrook 44.3

06/30/2008 Brad Lockerbie    41.8

08/27/2008 Alan Ambrowitz 45.7

08/02/2008 Charles Bundrick and Alfred Cuzan   48.0

08/28/2008 Robert S. Erickson anbd Christopher Wezlien 47.0

07/28/2008 Carl Klarner    47.0

06/07/2008 Doug Hibbs, Jr.   48.2

07/31/2008 Ray Fair    48.5

01/15/2008 Helmut Norpoth 49.9

09/08/2008 Average Forecast of Republican Share of the Popular Vote: 47.6

Note that these models differ in their particulars. I can discuss the details of the models if you’d like. Not all are equally reliable, based on past performance. Nonetheless, they all – with one exception (the Campbell forecast), project that Barack Obama will win the 2008 popular vote. The average predicted share for McCain is just under 48%, so the election will be close. But given the margin of error of most of these forecast models, we can safely assume that Barack Obama will be our 44th president. 

Or can we? Before assessing potentials pitfalls in these forecasts, I’m eager to hear your reactions. Do you trust these projections? Why or why not?


How many Democrats? How many Republicans?

I want to follow up on my last post regarding how variations in poll results are often due to differences in how pollsters construct their samples. The previous post talked primarily about whether pollsters were sampling likely or registered voters. Obama, I suggested, polled better among registered voters.  Today I want to look at another decision pollsters must make: whether to weight their sample by party identification and, if so, what weights to use. We know that whether one considers oneself a Democrat or a Republican is the biggest single determinant of how someone will vote. Not surprisingly, people tend to vote for the candidate who shares their party identification. So a poll that includes 40% Democrats in its sample is likely to have more favorable results for Obama than one that includes 35% Democrats, all other things being equal. Ditto for McCain and variations in the number of Republicans sampled.

To see how this makes a difference, consider two  respected national polls that came out yesterday. CBS/NY Times came out with their monthly national poll that has Obama up 49-44, with 6 undecided.

Rasmussen, meanwhile, has the race tied, 48-48% in its latest tracking poll.

There is a 5% difference in their results. Both polls illustrate the importance of how samples are defined. Most pollsters will weight their sample so that it matches the overall U.S. population of registered voters (or likely voters, as the case may be) along major demographic variables: gender, race, income.  For example, if a pollster’s initial sample of 1300 people included 57% women – which is higher than the number of women eligible to vote according to the U.S. Census – then the pollster would typically reduce the number of women actually counted in the poll to bring it closer in line with the population figures.  That is called weighting the final sample.

However, not all pollsters weight their sample by party.   That is, if 38% of registered voters in the U.S. are Democrats, many pollsters will not try to weight their sample to get the same proportion of Democrats.  Instead, they believe that by weighting by other demographics, the party percentages should come out pretty close to the actual totals in the population as a whole.  And they worry that if they try to fix the party weight at a particular percentage, they may skew results, particularly if party support seems to be very volatile.  In other words, when it comes to partisan identification, some pollsters let the sample speak for itself, rather than impose their own weight to insure a particular percentage of party members.  Historically, CBS has done this; in the last CBS/NYTimes poll taken a month ago, CBS did NOT weight by party. That poll had Obama up 45-42%, with 6% undecided (and 7% “other”).

However, the latest CBS poll DID weight by party. They averaged the number of Democrats who were polled in the three previous CBS/NYTimes poll, and made sure that today’s poll included that same average number of Democrats (and Republicans and independents). Just to give you an idea of what this means, let me provide the “raw” and weighted figures for both initial sample and the smaller sample of registered voters.

I can’t paste the actual table on this blog (I can send the actual table by email if you are interested), but looking at all 1133 respondents – the “raw” initial sample – we see that 28.8% of them are Republicans. This total is almost identical to their weighted sample of voters; when they “weight” the raw sample, they reduce the number of Republicans by only 4, to 28.4% of their sample.

Looking only at the 1004 registered voters, we see that the initial raw sample includes 30.4% Republicans, but this is increased to 31.6% Republicans in the weighted sample of registered voters. Similarly, looking only at registered voters, the percent of Democrats in the raw sample versus final weighted sample doesn’t change much at all – 40% in the unweighted sample versus 40.6% in the weighted sample. The biggest difference is a reduction in independents among the registered voters, from 29.6% in the “raw” sample versus 27.8% in the weighted sample of registered voters.

I show you these numbers to give you an idea of what it means to weight by party.  But why does it matter? Compare the CBS weighting to what Rasmussen calculates when they weight by party.

Note that Rasmussen’s tracking polling has always weighted by party, using a “dynamic” system in which they adjust the weight assigned to each party based on the trends revealed by previous survey.  In their latest national tracking poll, they weighted their poll to include 38.7% Democrats, 33.6% Republicans, and 27.7% unaffiliated. (That’s a change from the weights they used in their tracking polls for the first thirteen days of September, when the targets were 39.7% Democrat, 32.1% Republican, and 28.2% unaffiliated.)

We see, then, that Rasmussen’s weights include a higher proportion of Republicans than does the CBS/NYTimes poll of registered voters. The gap between Democrats and Republicans in the CBS weighted poll of registered voters is 9%. For Rasmussen it is only 5.1%. Given the difference, it is perhaps not surprising that Rasmussen has the race a dead heat, while CBS gives Obama a 4% lead.  Which is more accurate?  I have no idea.  And, in all candor, neither do the pollsters. But both CBS and Rasmussen recognize that the partisan distribution of voters is a changing target, and to their credit they are trying to make sure their samples reflect these changes.

The important point, however, is that the assumptions they make regarding the likely distribution of partisan identification among likely and registered voters has a big impact on the numbers they report.  And it means we need to be careful not to impute too much importance to small changes in these polls, or differences across polls that may say more about the pollsters’ decisions on how to weight by party than it does about any changes in voters political preferences.

A final thought: in recent elections, trends in partisan affiliation among survey respondents have been a very good predictor as to who won the election; in 2004, the proportion of people calling themselves Republicans in the raw sample data went up in the latter half of the campaign, presaging the Bush victory.   That may be a more important number than any of the actually reported results.  I will keep an eye on this figure and report the trends later in the campaign.

I have been postponing a discussion of the political science voting forecast models results, but they are all in.  I’ll try to get to them this weekend.  But you might be interested to know that they all – with one exception – agree regarding who will win the popular vote this election.  As far as these political scientists are concerned, the race is over.