Whether he did so out of frustration or some other emotion, I want to thank Nate Silver for taking time from his busy schedule to respond (twice!) to my critique of poll-based forecasting models similar to his. This type of exchange is common for academics, and I always find it helpful in clarifying my understanding of others’ work. Based on the email and twitter feedback, I think that’s been the case here – my readers (and I hope Nate’s too!) have benefitted by Nate’s willingness to peel back the cover – at least a little! – on the box containing his forecast model, and I urge him to take up the recommendations from others to unveil the full model. That would go a long way to answering some of the criticisms raised here and elsewhere.
Because of the interest in this topic, I want to take an additional minute here to respond to a few of the specific points Nate made in his comments to my previous post, as well as try to answer others’ comments. As I hope you’ll see, I think we are not actually too far apart in our understanding of what makes for a useful forecast model, at least in principle. The differences have more to do with the purpose for, and the transparency with which, these forecast models are constructed. As you will see, much of what passes for disagreement here is because political scientists are used to examining the details of others’ work, and putting it to the test. That’s how the discipline advances.
To begin, Nate observes, “As a discipline, political science has done fairly poorly at prediction (see Tetlock for more, or the poor out-of-sample forecasting performance of the ‘fundamentals based’ presidential forecasting models.)” There is a degree of truth here, but as several of my professional colleagues have pointed out Nate’s blanket indictment ignores the fact that some forecast models perform better than others. A few perform quite well, in fact. More importantly, however, the way to improve an underperforming forecast model is by making the theory better – not by jettisoning theory altogether.
And this brings me to Nate’s initial point in his last comment: “For instance, I find the whole distinction between theory/explanation and forecasting/prediction to be extremely problematic.” I’m not quite sure what he means by “problematic”, but this gets to the heart of what political scientists do: we are all about theory and explanation. Anyone can fit a regression based on a few variables to a series of past election results and call it a forecast model. (Indeed, this is the very critique Nate makes of some political science forecast models!) But for most political scientists, this is a very unsatisfying exercise, and not the purpose for constructing these forecast models in the first place. Yes, we want to predict the outcome of the election correctly (and most of the best political scientists’ models do that quite consistently, contrary to what Silver’s comment implies), but prediction is best seen as a means for testing how well we understand what caused a particular election outcome. And we often learn more when it turns out that our forecast model misses the mark, as they did for most scholars in the 2000 presidential election, and again in the 2010 congressional midterms, when almost every political science forecast model of which I’m aware underestimated the Republican House seat gain (as did Nate’s model). Those misses make us go back to examine the assumptions built into our forecast models and ask, “What went wrong? What did we miss? Is this an idiosyncratic event, or does it suggest deeper flaws in the underlying model?”
The key point here is you have to have a theory with which to start. Now, if I’m following Nate correctly, he does start, at least implicitly, with a baseline structural forecast very similar to what political scientists use, so presumably he constructed that according to some notion of how elections work. However, so far as I know, Nate has never specified the parameters associated with that baseline, nor the basis on which it was constructed. (For instance, on what prior elections, if any, did he test the model?) It is one thing to acknowledge that the fundamentals matter. It is another to show how you think they matter, and to what degree. This lack of transparency (political scientists are big on transparency!) is problematic for a couple of reasons. First, it makes it difficult to assess the uncertainty associated with his weekly forecast updates. Let me be clear (since a couple of commenters raised this issue), I have no principled objection to updating forecast projections based on new information. (Drew Linzer does something similar in this paper, but in a more transparent and theoretically grounded manner.) But I’d like to be confident that these updates are meaningful, given a model’s level of precision. As of now, it’s hard to determine that looking at Nate’s model.
Second, and more problematic for me, is the point I raised in my previous post. If I understand Nate correctly, he updates his model by increasingly relying on polling data, until by Election Day his projection is based almost entirely on polls. If your goal is simply to call the election correctly, there’s nothing wrong with this. But I’m not sure how abandoning the initial structural model advances our theoretical understanding of election dynamics. One could, of course, go back and adjust the baseline structural model according to the latest election results, but if it is not grounded on some understanding of election dynamics, this seems rather ad hoc. Again, it may be that I’m not fair to Nate with this critique – but it’s hard to tell without seeing his model in full.
Lest I sound too critical of Nate’s approach, let me point out that his concluding statement in his last comment points, at least in principle, in the direction of common ground: “Essentially, the model uses these propositions as Bayesian priors. It starts out ‘believing’ in them, but concedes that the theory is probably wrong if, by the time we’ve gotten to Election Day, the polls and the theory are out of line.” In practice, however, it seems to me that by Election Day Nate has pretty much conceded that the theory is wrong, or at least not very useful. That’s fine for forecasting purposes, but not as good for what we as political scientists are trying to do, which is to understand why elections in general turn out as they do. Even Linzer’s Bayesian forecast model, which is updated based on the latest polling, retains its structural component up through election day, at least in those states with minimal polling data (if I’m reading Drew’s paper correctly). And, as I noted in my previous post, most one-shot structural models assume that as we approach Election Day, opinion polls will move closer to our model’s prediction. There will always be some error, of course, but that’s how we test the model.
(Drew’s work reminds me that one advantage scholars have today and a reason why Bayesian-based forecast models can be so much more accurate than more traditional one-shot structural models is the proliferation of state-based polling. Two decades ago I doubt political scientists could engage in the type of Bayesian updating typically of more recent models simply because there wasn’t a lot of polling data available. I’ll spend a lot of time during the next few months dissecting the various flaws in the polls, but, used properly, they are really very useful for predicting election outcomes.)
I have other quibbles. For example, I wish he would address Hibbs’ argument that adding attitudinal data to forecast models isn’t theoretically justified. And if you do add them, how are they incorporated into the initial baseline model – what are the underlying assumptions? And I could also make a defense of parsimony when it comes to constructing models. But, rather than repeat myself, I’ll leave it here for now and let others weigh in.
Matthew,
Just quickly … I must object to this sentence: “Anyone can fit a regression based on a few variables to a series of past election results and call it a forecast model.” Doing serious statistics is genuinely difficult, and most people are terrible at it. Even among extremely well-educated people, the most basic aspects of probability and statistics are not widely understood, and often sloppy reasoning and the abuse of statistical tools ruin otherwise good work. This happens in every area of science I know.
Clark,
My apologies for the loose use of the term “anybody” – I meant it in the context of reasonably trained social scientists who want to venture into the forecasting field. Ironically, it builds on a criticism that Nate has voiced regarding some political science forecast models. But I didn’t mean to suggest that any Tom, Dick or Mary off the street can run a simple regression!
Matthew,
A few random thoughts here.
We seem to be agreed that we want to know how the world works, and that prediction is a reality check that allows us to know whether we’re doing a good job of this task.
So it’s pretty central to my position that I think we are largely flunking this reality check. Prediction, in political science and in other social sciences (and also in some hard sciences), isn’t really going very well at all. (I have a book coming out in September about this topic.)
I think we should ask whether there are certain bad habits that are contributing to this problem. In my view, there are several of these:
a) Overconfidence in theory that has not survived empirical scrutiny. Theory (like prediction, to which it is intimately related) is a means rather than an ends toward understanding the objective world.
b) A preference for parsimony over robustness, which sometimes resolves into ideology since dogmatic views about the world are very seductive.
c) Discomfort with uncertainty.
d) Tolerance for (or encouragement of) clumsy heuristics for weighing new information.
e) As a more technical matter, mistaking fitting a statistical model for prediction or explanation, e.g. overfitting. (You’re right that it’s easy to fit a regression model. It’s hard to do it well, in a way that makes a positive contribution to theory/prediction.)
In terms of voting behavior, our experiments with prediction suggest that (i) the theory isn’t all that good and (ii) it isn’t likely to get much better since we’re only getting one fresh data point every four years and since elections are extremely overdetermined things.
My view is that models based on “the fundamentals” are likely to top out at explaining about 50% of voting results if they’re constructed well. If they’re constructed badly, they’ll claim they can predict more but will actually predict less when tested out of sample. Some of the specifications that political scientists and economists have come up with in a way to get around the problem are dubious, which is testified to by their poor out-of-sample forecasting performance.
Meanwhile, some of the excuses made for the poor performance of these models (e.g. after the 2000 election) verge on denial, I think. The “right” answer is that forecasting presidential elections is actually pretty hard, and you can’t do it very well without debasing yourself by looking at horse-race data.
If you do make predictions based on the fundamentals, you ought to be thinking carefully about how to make your model more robust and how to describe the uncertainty in the forecast appropriately. We ought not be too presumptuous about claiming we “know” better than the voters do about how they’re going to behave.
In my view, the 538 model takes the theory about as far as the theory can go. If by the time we get to Election Day, the “fundamentals” say that Romney will win by 5 points, and the polls say that Obama will win by 5 points … we won’t really need to wait for Brian Williams to declare Obama the winner to know that the theory has gone off the rails.
I can see a case that there is some value in publishing the 538 fundamentals model right up through Election Day nevertheless. We actually do publish two different versions of the model every day — a “pure polls” version and a “polls + economy” version. So a “pure economy” version would in some ways be the logical complement to that. (Although, there is some ambiguity about how the “pure economy” version would make forecasts at the state level as opposed to for the national popular vote.)
Some smaller issues:
1) I don’t think it should be very comforting that some political science models have had reasonably good results and some have had bad ones. Some people beat Vegas at roulette on any given evening. Some investors beat the stock market in any given month/quarter/year, and yet there is (relatively) little evidence of persistent stock-picking skill, etc, etc.
Mind you, I DO think that some of the “fundamentals-based” models are better than others. But this is based on my (prior) beliefs about the “right” way to build a forecast model. From a frequentist point of view, their out-of-sample results are consistent with a null hypothesis of no forecasting skill (but a fair amount of random variance).
2) I find it a bit vexing that you had not reviewed much of the documentation that has been provided about the 538 model, and yet the lack of documentation has rapidly become a central point in your critique.
Nevertheless, you have a cogent point since the documentation is not really complete. The idea has been to produce it in installments. Here, for instance, is a post from this morning that details a relatively minor aspect of the forecast model:
http://fivethirtyeight.blogs.nytimes.com/2012/06/22/calculating-house-effects-of-polling-firms/
The rest of these posts should be coming out over the next couple weeks. Then maybe we’ll open it up to questions if people feel like there’s anything they’ve missed. It’s certainly the case that producing the documentation posts can sometimes feel like a chore — but 538 has always gotten terrific feedback from people who appreciate the detail and is much the better for it.
I’m definitely just a lay political junkie but I wanted to mention two things about Hibbs model. Nate argued pretty convincingly that “Bread and Peace” is the best of its kind, but part of the 1952-1988 performance was luck and fit and there’s nothing magical about it.
But, Nate’s review of 1992-2008 models shows that Hibbs’ bread and peace model with revised data has an r^2=.4 and error of 2.6 points. The horse race models’ raw correlation was comparable or better. The “Time for Change” model had an error of around 1.7 points. Leading Economic Indicators is around 2 points.
Hibbs’ model would have also fared poorly in 1948. One can explain that away by saying erratic data, but if you look closely with Nate’s May payrolls analysis and the “Time for Change” model, Truman was unpopular but got a blast of strong economic growth to help a lot. It seems like the weights applied to early versus later years could be changed to improve predictive power, but they helped with fit in the 1952-1988 years.
Professor,
A few questions on your recent posts: First, you observe that polling accuracy at this point in a race is hampered by low information, but you also note that by Labor Day, almost all voters have made up their minds. I realize that there is no single turning point across all races, but in federal-level (presidential or congressional) contests, when is the chronological point at which polls become useful predictors? (And at this point, is it beyond a campaign’s ability to affect an election?) On a related note, at what point do campaign spending disparities (which you might argue are information disparities) start to affect electoral outcomes independently of the fundamentals? (or is campaign spending driven by political activists who are donating based upon the fundamentals, even when the rest of the public has yet to catch on?)
The Maine Senate race, (which is of particular interest to me…) seems to complicate an application of the fundamentals, given the presence of a non-trivial independent. Party ID obviously isn’t as helpful, particularly with the issues of separating the Tea Partiers and the leaners from the “independent” category. That said, King’s lack of party affiliation allows him to “run against” the entire DC establishment, although plenty of other MCs do that every election cycle without letting their party label get overly in the way.
If we abandon explanatory theoretical explanation (and perhaps make like Nate Silver…) King’s financial and name recognition indicators are robust, but those gaps will surely narrow over the course of the summer. Helpfully for King, the DSCC seems likely to spend their money elsewhere, but there’s a sense that the dark clouds of Crossroads, et al. may be looming on the horizon. Things may soon be getting interesting.
It’s been great to see Nate Silver and you tease out your differences. Not sure how to bridge the impasse over your respective ideal visions for theory. He’s correct that the infrequency of elections limits our n with which to measure the effectiveness of various theories, but that’s still true even if we heap on loads more variables before we’re sure if the ones we’ve started with are accurate predictors of voting behavior. In any case, if Nate considers the fundamentals to be inaccurate, he might do well to hold off on making his predictions until the point in the race where polls lock in their accuracy, since his figures at this point would (by his own reasoning) be mostly speculative.
-Will
Nadia,
One reason we can assess the Hibbs’ model and see both its strengths and weaknesses, of course, is because he presents it in full, warts and all, for the rest of us to critique. It’s through this process of scrutiny, testing and feedback that academics improve their understanding of electoral dynamics. Your comments attest to this, and provide additional support for why transparency is so crucial in evaluating forecast models.
Hi Will,
It’s good to hear from you – sounds like you are prospering in the real world! Some great questions here. Let me focus first on the national level campaign questions. I don’t think I would say that “almost everyone” has made up their mind by Labor Day – but most voters are locked in by then. As I recall, a recent Pew survey suggested that something like 70% of voters have already made up their minds, and (again drawing on memory) I think that figure climbs to 80% or higher by Labor Day. But that still leaves a significant chunk of undecided voters. So, according to exit polls from 2008, about 60% of voters had made up their minds prior to September. Another 14% decided in September, and 15% more did in October. That still leaves a not insignificant chunk of voters – about 10% or so who, if exit polls can be believed, decided in the last week or so of the race. This is why Nate’s model gradually switches over to an almost pure poll-based forecast as election day draws nigh.
As for campaign spending, my colleague Bert Johnson’s research suggests that it is more a function of candidate viability than a cause. So, if there’s a huge disparity in fundraising, it’s probably because the fundamentals favor one candidate over the other. This is a bit of a generalization, of course, but I think I agree with it. Now, that disparity can play out in different ways – through GOTV operations, media advertising, etc. So to a certain extent my answer depends on the assumption that I make more generally about campaign effects – that both sides are equally skilled at leveraging the resources at their disposal, given a particular campaign context.
The Maine Senate race looks fascinating, and I’m counting on you to keep us up to date! I don’t pretend to have any independent expertise on the race, but I think your instincts are correct; in a three-person race, some of the typical measures, such as party ID, are going to have to be adjusted in order to produce an adequate forecast. You may be right that we might be rely on polling data. In that vein, assuming a similar media dynamic, and given the fact that King is familiar to Maine voters, I wouldn’t be surprised that he already has a built in vote that is similar to what he pulled the last time he ran for statewide office. In that sense, then, maybe the fundamentals do matter, at least a bit.
I wouldn’t say that Nate and I are at an impasse. I think the discussion turns more on what we want to accomplish when we construct forecast models. I wasn’t being cheeky when I suggested Nate is under tremendous pressure, given his visibility, to get the forecast right. It makes sense, then, for him to adjust his model as the election draws nigh by increasingly relying on polling data. But I hope you can see why that’s not particularly useful for political scientists. In the end, we want to know why the election turned out as it did. If Nate is right, and our fundamentals-based forecast model can account for maybe 50% of election outcomes, we’ve still learned something. That’s progress, and we should be satisfied that we took a theory, tested it, and found that it only accounts for a portion of our understanding regarding how people vote. Unlike Nate, we are not particularly vested in getting the result right except as a means of teaching us about why people vote as they do.
Hope this helps. Great questions, as always…
Nate,
Again, I think we agree much more than we disagree. You may be right that political scientists are “flunking the reality check” when it comes to forecasting presidential elections, and it may be for the reasons you cite. But the fact that you can point to specific flaws in particular models drives home my point about how crucial transparency is to advancing our collective understanding of election dynamics. It may very well be the case that your model pushes the theoretical limits of forecasting ability, as you claim. If so, we can all learn from it – but only if you show us the moving parts. Then you wouldn’t have knuckleheads (like me) criticizing you for relying too heavily on polls this early in the process. In my defense, it’s one thing to say you discount polls very early – it’s another to show by how much – and still another to justify having them in the model in the first place (which was the point of my original post!) Again, academics are very comfortable with the process of showing their work. It sounds like you are moving in this direction – I very much encourage you to do so.
Given your day job, of course, and the limits of structural-based forecasting models, I don’t blame you for moving toward a poll-based prediction model as the election draws near. Your readers (at least most of them, I gather) don’t come to you to understand the intricacies of forecast modeling – they want to know who is going to win the election and, preferably, by how much. Academics, in contrast, care less about getting the outcome right as the primary goal, and more about understanding why we got that outcome. So a purely polled-based model, although more accurate in the end, isn’t very useful to us. In other words, a theory “that goes off the rails” can still teach us something – as long as we know where and why it went off the rails. But you have to have a theory! Toward that end – yes, I think it would be useful to publish your fundamentals-only model. And yes, accounting for the play of economic factors at the state level isn’t easy, which is why most political science models focus on the popular vote and not the Electoral College. But if anyone can overcome that hurdle, I’m sure you can.
I may be guilty of not reviewing the “documentation that has been provided about the 538 model”, but as you acknowledge, although your descriptions in prior posts of what you are trying to do is probably very helpful to your readers in terms of providing a basic overview of your forecasting logic, it’s really not at the level of detail needed to assess the model.
We can quibble about issues such as how parsimonious our explanatory models should be, what constitutes forecasting success, etc., but it would just detract from my larger point that for much of what you say you are preaching to the choir. The major difference we have – and again, I think it reflects different professional vantage points – is that I’m perfectly happy if we can show that fundamentals-based forecast models “are likely to top out at explaining about 50% of voting results if they’re constructed well.” That raises all sorts of interesting research questions, beginning with how do we – can we? – explain the rest of the variance?
If we disagree, then, it may be on how we define forecasting “success”. You write, “The ‘right’ answer is that forecasting presidential elections is actually pretty hard, and you can’t do it very well without debasing yourself by looking at horse-race data.” But that’s only if our criteria of success is getting as close to the final result as possible. However, we can also measure “success” by how well we can specify, within a degree of confidence, how much of an election result is driven by specific fundamentals, and why.
As a profession, political scientists depend on criticism to advance their knowledge base. It’s not always easy to be on the receiving end, of course. But then, most of us don’t put our work on the line in high profile column; disagreements in our profession rarely make the pages of a national newspaper. Given your level of visibility, it speaks well of you that you are willing to put your own model under the same level of scrutiny that you have applied to the political science forecast models. Keep up the good work.
As an interested observer (and, admittedly, one only marginally informed about polling and election forecasting methodologies), I find it amazing that this discussion managed to avoid mentioning campaign financing until Will broached the topic (“dark clouds of Crossroads”) in his post. Equally mindboggling is Professor Dickenson’s Bert Johnson’s research, which apparently suggests that money flows disproportionately to the candidate blessed with the best election-landscape fundamentals.
On one level, I can’t understand why, in a world of largely unregulated and unlimited political contributions, the disparity in campaign funding isn’t itself one of the major “fundamentals,” used in election forecasting. Perhaps it is, but, if so, that isn’t clear in any of the prior exchanges between Prof. Dickenson and Mr. Silver. Conventional (i.e. non-academic) wisdom certainly assumes that having more money improves a candidate’s odds, an assumption obviously shared by everyone from small-contribution donors to billionaires.
I’ll risk mixing apples and oranges to note an item in today’s news that an initiative to increase the tobacco tax seems to have failed in California (http://seattletimes.nwsource.com/html/health/2018500951_apuscaliforniatobaccotax.html). According to the AP article “Polls showed approval [to impose the tax] peaked at about two-thirds in March…” but then fell dramatically after the Philip Morris-backed opposition poured “millions of dollars into an advertising blitz that whittled away support.” I’m sure there are many distinctions between forecasting electoral campaigns and state initiatives, and perhaps the AP journalist is wrong in attributing the plummeting support for the initiative to the opposition’s money and advertising superiority. I find it difficult to believe, however, that the tobacco tax went down to defeat in California because the “fundamentals” favored the opposition, which then drew more financial support precisely because of those favorable fundamentals.
In the current presidential campaign, the poor economy is clearly a fundamental favoring Romney over Obama. I suppose one could use that fact to support Johnson’s thesis – i.e. that more money is flowing to the Romney side (particularly to largely unregulated and anonymous SuperPacs) because of the pro-Romney economic fundamental. Such an analysis would seem to be dangerously naive, however, in that it’s hard to imagine the pro-Romney pool of big donors significantly reducing their financial support if, say, the unemployment rate was 7.0% rather than above 8%. (Granted, there may well be other fundamentals beyond the economy – beyond their own self-interest, that is – that predispose big donors to favor Romney.)
Talking about the percentage of uninformed voters today or on Labor Day without acknowledging how many of them will become “informed” seems more than a little odd. A significant percentage of these voters will get much of their information from negative and misleading advertisements funded by the respective campaigns and by the outside groups supporting them. In a presidential race that is generally assumed – or, if you like, forecast – to be quite close, it seems to this lay person e that unregulated campaign money may prove to be not just one of the fundamentals this election year, but THE fundamental.
Dwight,
This is a great question because it expresses a sentiment that I think a majority of the population holds – but for which, believe it or not – political scientists don’t find much empirical support. It may seem obvious to you, and many others, that fundraising disparities can, by themselves, significantly alter election outcomes, but proving this is actually more difficult than you might think. Again, the impact of money varies across types of elections; as you note, the dynamics of a ballot initiative may differ from a presidential election, so we have to be careful about generalizing here. I’m going to let me colleague Bert Johnson, who is finishing up a book on campaign finance, respond more fully to you, but for now, let me give you a thought experiment: do you think your current preference in the presidential race would change if you were exposed to enough negative advertising against your preferred candidate? My guess is no – andn that holds for most people. If fact, our studies show that the impact of campaign advertising is quite short-lived, and that its persuasive impact is far more limited than you might think. Now, there are other ways that money comes into play, and we can discuss these, but the short-hand answer is that in high information races, like a national presidential campaign, where most people know a bit about each candidate, and in which both candidates are likely to achieve a minimal threshold of financing, disparities in campaign financing are not, likely, to make much difference in the outcome. This is not necessarily true in other types of races, which we can discuss. But I am fairly certain the the outcome in 2012’s presidential race likely will not be a function of fundraising disparities.
Professor,
I too was working from memory on that, based on one of your previous tweets, so it’s comforting to me, as someone involved in a campaign, that my post-Labor Day efforts won’t be just treading water (although that’s certainly part of it). So would it be correct to say that the blocs of support that polls are currently registering might be fairly stable, but that we just need to be careful not to extrapolate too much without incorporating the undecided voters into our margin of error?
Right, I remember we brought up Prof. Johnson’s points in class. It seems like, once you reach the level of state-wide elections, that it’s a fair bet to assume that everyone’s campaign staff is about equally competent. In my recent experience, I think you could apply that logic to campaign manpower as well. No one wants to spend weeks or months volunteering for a losing campaign, and so politically-engaged people vote with their time in the same way political donors vote with their cash (since no one wants to drop hundreds or thousands of dollars on a losing campaign either).
Past vote share, in his case about 59% (in a three-way race like this one), is a nice fundamental for him, then. It might be interesting to see how fundamentals-based models fared in presidential elections when Perot or Nader were thrown into the mix.
I think that’s what I was aiming at, that social science is primarily about the “why” and that Nate may be less concerned with explanation in comparison to making accurate predictions. Although he seems to hope to have it both ways.
Thanks so much for answering my questions. Hopefully I’ll be able to keep working the Maine race into my responses. It’s always fun to apply theory to practice.
Thank you for your detailed response to my question about the role of campaign financing, Professor. I do find it quite interesting that there isn’t much empirical support backing the widespread assumption that an advantage in funding can tip the scales in a close election. To extrapolate, that political science “fact” suggests that much of the sound and fury that has occurred post Citizens United ruling is misplaced (and also suggests that campaign donors are deluded in believing their contributions are particularly meaningful…).
I can’t argue with your thought experiment, at least as it pertains to me. You are quite correct to suggest that no amount of negative advertising will alter the voting preference that I currently hold. That fact doesn’t seem particularly relevant, however, in the case of currently uninformed or self-proclaimed “undecided” voters. If there really is such a thing as an undecided voter — a debatable point, I’m guessing — it would seem they might be more susceptible to negative advertising than those with preexisting firm preferences. Your response indicates that it will be the usual fundamentals that ultimately tilt these undecideds one way or another, rather than a disparity in campaign war chests. That may well be true and historically well-documented. Nonetheless, it seems to me that this cycle’s staggering amount of campaign money, and the potentially wide disparity in funding between the two sides, could make campaign financing a more impactful variable this time around than it has been in the past.
This has been a great discussion. But there’s one more point that I think has been neglected.
You mentioned in the previous post that Sam Wang and I had about the same forecast for the U.S. House in November 2010. But that isn’t really true. Sam’s forecast, IIRC, was that Republicans would pick up 53 seats, with a standard error of +/- 2. Ours was that they would pick up 55 seats, with a standard error of something like 12. Those are very different forecasts. For Sam, the G.O.P.’s actual 63-seat gain represented a 5-sigma (Pr ~= .0000003) error. For ours, it was within the sweet spot of the forecast range.
Our model was *much* less assertive about its forecast. On the other hand, it performed about as well as advertised. This is typical; our models often have considerably wider forecast ranges than those produced by our competitors.
Returning to presidential elections, what is damning about the “fundamentals based” models is not so much that they’ve made some bad calls, but that they’ve done this despite claiming to be *extremely* accurate. Some of the models in 2000, for instance, would have given anywhere from hundreds-to-one to billions-to-one odds against the election actually being as close as it was, given their reported standard errors. Models that claim to explain 80-90 percent of election results have come nowhere near that when tested out-of-sample.
The most important lessons from something like 2000, therefore, are not about whether we should use third quarter GDP instead of second quarter RDI or something like that. Instead, they suggest that there are some profound misconceptions in the way that people are thinking about modelling — and by extension their theory-development.
I guess what this really boils down to is that I’m not sure what it means for a model to be “a-theoretical”. I think it might be more useful to talk about hypotheses.
My hypothesis is not just that 50% of election results can be explained by economic factors; it is also that the other 50% CANNOT be explained by them.
That is a testable hypothesis, of course. My assertion is that there will be the occasional election where the results diverge quite a lot from what we might expect based on readily-available measuers of economic performance.
Likewise, a forecast that solely uses polls is implicitly asserting that there isn’t any connection at all between economic performance and elections. That’s an extremely bold hypothesis that might have lots of interesting policy and political implications! Although I suspect, of course, that it is quite wrong.
Where I think one can go wrong is by assuming that a model with a higher R^2 is intrinsically better than one with a lower R^2. What you should want is a model that performs as well as advertised when tested on out-of-sample data. Meanwhile, some processes are intrinsically more predictable than others.
Likewise, a theory that implies that a process is more predictable is not intrinsically better than one which implies it is less so. Both type I and type II error count as being wrong.
What I think of as an “atheoretical” model, I suppose, is instead one where the researcher blithely assumes that the output of a model fit to past data is tantamount to a prediction, without considering the myriad assumptions that are latent within the model. This is a huge problem in elections forecasting, where the number of plausible model specifications is vast, but the sample size is extremely small, the data is noisy, and many candidate variables are closely correlated with one another.
Here is the latest wisdom on the value of forecasting (and NSF funding): http://www.nytimes.com/2012/06/24/opinion/sunday/political-scientists-are-lousy-forecasters.html?_r=1&pagewanted=all
Erik,
This post is getting a lot of play but, alas, it is very poorly argued and not very credible. If I get a chance I’ll try to do a blog post discussing just how thin it is. I suspect, however, that many others will beat me to it – it’s a very easy target to hit!
Nate,
Lots of good points here regarding how to evaluate forecasting “success” that build on your earlier critiques of political science forecast models based on the fundamentals. It deserves a more detailed response which I’ll try to put together in a separate post. One quick correction, however, to your comment here regarding Sam Wang’s 2010 forecast: I was actually referring to Sam Wang’s 2008 presidential forecast which used a weighted average of state-based polling to just about nail the final Electoral College vote. I think it was (by a very narrow margin!) about as good a forecast that I saw. My point in mentioning it is that Sam had no theory or model driving his forecast at all, and there was no structural component based on fundamentals. Instead, it was based purely on polls. As such, it was incredibly accurate, but it didn’t tell us much about why voters voted as they did. Great prediction – not very useful for political science purposes.
Dwight,
Thanks for the follow up comment. Let me be clear – disparities in campaign funding can make a difference in some races; it’s pretty clear, for example, that in House races, challengers who do not achieve a minimum level of funding are at a distinct disadvantage in a race against a well-funded incumbent. But in presidential races, it is a different story; because both candidates generally are well funded, and because there are so many sources of information that allow voters to evaluate candidates outside of advertising campaigns, disparities in funding are not typically significant. Of course, if the presidential election comes down to a few votes, one can pretty much point to almost any factor as the deciding one. But when we try to apportion causality to the outcome of a presidential race, we generally put campaign funding down on the list, below other factors, such as the economic context.
You are right about both the impact of Citizen’s United, and how the undecideds vote. At this point, spending in the presidential race – despite Citizen’s United – is down from a comparable date in the 2008 cycle. We think this is because the sluggish economy has discouraged people from contributing to campaigns. And we expect the undecideds to eventually decide how to vote based largely on the fundamentals, as opposed to factors such as advertising that are more directly influenced by campaign spending.
Hope this helps….
Again, Professor, thanks much for your helpful clarifications. I appreciate your efforts to shed some light on the murky world of forecasting and polling, even for those of us watching from the sidelines.
Thank you for the excellent questions. That’s what make this blogging thing so interesting!
A measured rebuttal of the Jacqueline Stevens NYT op-ed: http://themonkeycage.org/blog/2012/06/24/why-the-stevens-op-ed-is-wrong
…and a mocking re-writing of it: http://themonkeycage.org/blog/2012/06/23/traditionalist-claims-that-modern-art-could-just-as-well-be-replaced-by-a-paint-throwing-chimp
I’m coming late to this, but I’ll add my two cents on the campaign finance aspect of the discussion. Since political scientist Gary Jacobson first studied this subject in 1978, there’s been a relatively consistent finding in political science research that campaign spending makes the biggest difference for underfunded challengers and the least difference for overfunded incumbents. There is, in other words, a diminishing return to campaign spending. Jacobson suggested that the reason for this is that challengers are more often unknown and need to spend money simply to make voters aware that they exist. Most people have already formed opinions of the incumbent, however, so spending has less of an effect on these judgments.
As Dwight suggests, my view is that the reaction to Citizens United has been out of proportion, at least inasmuch as that reaction presumes that money is buying election outcomes. If Citizens United has done anything, it may have been to increase the level of competition by providing a source of funds for underfunded challengers to tap. Indeed, in the 2012 presidential primaries, Gingrich and Santorum might not have had the money to continue as long as they did, had it not been for Super PACs. In the case of the Wisconsin recall election, Republican Governor Walker outspent Democrat Tom Barrett 89% to 11% in terms of total money spent WITHOUT including outside spending in the equation. When you add in the outside spending on both sides, Walker and Barrett were actually more even: Walker and his allies spent 72% of the total, while Barrett and his allies spent 28%.
So in the admittedly few cases we have so far, the outside spending from Super PACs and others seems to have made elections more competitive. But this effect of money on election outcomes is probably only because money helps underfunded candidates make their names and messages to the public. The public still has to accept or reject these messages. This is why, in presidential general election campaigns in which both sides are well funded, forecasting models do not include campaign finance figures — nor would they improve by doing so.
Thanks for this Will – I’m working on a longer post in rebuttal as well.
Thanks for your detailed explanation about the impact of campaign financing, Professor Johnson. I agree that outside funding by big donors allowed the Gingrich and Santorum campaigns to be more competitive, and lengthy, than they otherwise would have been. That said, I’m not as clear about the lessons to draw from the Wisconsin governor’s recall election. Adding the outside contributions to the totals may have made Walker’s and Barrett’s respective funding “more even” than it was without the outside funding component, but it’s hard to think of a 72% to 28% money advantage as having much of any relationship to the word “even!”
Again, much of the post-election reporting gave at least some credit for Walker’s victory to his sizable money advantage. In this case, of course, a widely held distaste among Wisconsin voters for the recall process itself was also reported to be one of the key factors working to Walker’s advantage. No shortage of variables in the mix, I guess! But, while I take your point that campaign financing doesn’t deserve a place at the forecasting fundamentals table, I’m not so certain that the Walker/Barrett race is the best example to prove that point.
Dear everyone, greetings. Prof. Dickinson invited me over here to see what’s going on. It’s so interesting. I hope to contribute some comments that are on-topic. Pardon me if I repeat some points already made.
I want to address what I believe to be an inadvertent mixing-up of concepts that has popped up in this discussion. Basically, I think the term “prediction” is used too loosely. I’ll make an analogy to weather forecasting.
Weather forecasting begins from immediate measures such as temperature, wind speed, and other current conditions. Adding likely future changes, whether on a short time scale (prevailing wind) or a long time scale (climate modeling) can add useful information to inform the future.
When you’re watching a hurricane, what you want is the person with current conditions and short-term trends. However, to learn about the next hurricane, different expertise comes in.
(1) What a poll analyst like Nate Silver or myself starts from is a snapshot of current polling conditions, sometimes with uncertainties added into the mix. In the analogy, we are weathermen.
(2) Political scientists like Ray Fair and successors provide a prediction of a future event. In the analogy, he is the climatologist.
These approaches often overlap, though it is not acknowledged explicitly. Combining them produces a third category (3), in which current conditions (the snapshot) and likely future changes can both inform a true prediction for a current race.
For example, Silver makes efforts to draw upon (2) a bit, and calls his hybrid calculation a prediction. Calling it a prediction satisfies his readers’ desire for one. He also provides very interesting ongoing commentary. All of this is legitimate, but I don’t think it’s appropriate to be too negative about approach (2). The weatherman should not castigate the climatologist.
To my own taste, such a hybrid approach does not add all that much information to the Presidential race. It tends to conceal what (1) and (2) above can each tell us separately — and approach (2) is open to debate, as seen in this thread. Thus at the Princeton Election Consortium I have in the past provided (1), a pure snapshot of polls.
However, a hybrid approach (3) is very useful for assessing individual races where less data are available, such as House and Senate races. This is an important practical application. If done cleanly, one could imagine using a variety of variables, including past trends and campaign spending, to inform a true prediction.
Now, a longer response, which arose from some recent correspondence I had with Carl Bialik at the Wall Street Journal.
All the best,
Sam Wang
>>>>>>>>>>>>>>>>>>>>
Nearly all online commentary on polls is not really a forecast, but instead a snapshot of current polls. Sometimes people add uncertainty to the snapshot, either inadvertently (suboptimal statistical analysis) or on purpose (assuming future trends), then call the product a forecast. But let’s unpack that a bit.
Most readers here probably agree that a traditional forecast is what Ray Fair and others do with the bread-and-peace model. (Aside: Pardon me if I fail to cite later efforts, which are interesting but don’t change the essential thrust of my argument). For the 2012 Obama v. Romney race, models like that suggest a very close race. This is not much fun, but they could be useful for evaluating where a current snapshot of polls might go in the future. Bread-and-peace models could be used to add random drift in national opinion (in the form of a single parameter that drives polls in one direction, together) that generally nudges things toward the bread-and-peace prediction. In some sense, this is basically a Bayesian prediction.
It is possible to embed such drift into a model in a more hidden manner, but that makes it hard to see the current snapshot. Since predictive models have only moderate power, I would recommend that they be added on separately from the polling snapshot.
>>>>>>>
I want to bring up a separate topic I am thinking about. In fact, I’d be interested in having a collaborator to work on it. Readers here are welcome to contact me at sswang@princeton.edu. The core question is: how can one use polling data to generate an optimal snapshot of conditions today?
I am thinking about a common analytical error in producing a poll snapshot. Here are some observations:
(1) Nearly all pollsters are professional enough in their methods to provide statistically informative information. By this I mean that their results can usually be understood as being a sample of true current opinion, plus/minus some consistent, ongoing bias due to their individual stratification and sampling methods. This view is well supported by multiple sources, including Charles Franklin at the University of Wisconsin, as well as David Shor from the Stochastic Democracy blog, who visited me a few years ago.
For example, Rasmussen and Gallup are both useful sources of information, despite the fact that they often appear to say different things. It’s like having a bunch of clocks, some of which are set fast and some of which are set slow, but all of them run at the correct rate.
(2) Therefore finding that correction offset first, then putting polls together in a meta-analysis, is an approach that approximates the true uncertainty.
Early approaches to a correction were not quite right. In 2008, Silver’s Pollster-Introduced Error turned the offset into a multiplicative factor. However, this leaves the original offset error in place, which might explain a tendency toward underconfidence in his expressed probabilities. Even in 2010, knowledgeable professional statisticians calculated an average before smoothing, which again baked the offsets into the final estimate. Both of these approaches tended to overestimate the true uncertainty. They were also susceptible to having too many polls from one source such as Rasmussen. See some interesting related commentary by Andrew Gelman here: http://andrewgelman.com/2010/11/some_thoughts_o_8/
Based on past Presidential races, it’s possible to have a day-by-day barometer that is effectively accurate to 0.1-0.2% swings in public opinion. That is the equivalent of my error on Election Eve 2008 of 1 electoral vote: http://election.princeton.edu/2008/11/11/post-election-evaluation-part-2/. I note that I did not calculate pollster-specific offsets, which made my apparent uncertainty rather high. This worked because pollsters are, when averaged, not very biased.
(3) A high-quality estimator would help identify truly game-shifting events (i.e. adding Sarah Palin to the GOP ticket), as distinct from individual events that mostly don’t have immediate effects on the Presidential race.
However, economic and competitive forces work against a commercial poll aggregator taking on such an approach. The resulting snapshot would be very stable (see my 2004 and 2008 graphs at http://election.princeton.edu/history-of-the-2004-race/), and would therefore be less interesting to watch. This is not to the advantage of aggregators who depend on pageviews: RealClearPolitics, FiveThirtyEight, TalkingPointsMemo, and so on.
In my opinion, where commercial and amateur aggregators add a lot of value is by analyzing individual House and Senate races. Potentially, they can do very well at filling in missing information.
All the best,
Sam Wang
Princeton Election Consortium, http://election.princeton.edu
Associate Professor, Molecular Biology and Neuroscience
Princeton University
I’m just an interested observer. God bless those of you who think you will be able to accurately predict elections in the future. Good luck with that.
For me, its satisfying to have responsible poll aggregation. End of story.
It serves a useful purpose for sorting most if not all of the poll cherry-picking that happens almost hourly on every internet site, every newspaper, and every tv news show. I get calls every day from some breathless friend telling me this or that about some new poll.
In the face of ever-increasing high quality and accurate poll aggregation I find it amazing that there continues to be so many badly written cherry-picked poll stories, and that the state of the horse race itself comes to dominant the campaign to the extant that it does.
Ultimately the debate over who is going to win simply re-litigates the underlying policy debate, but at one level of abstraction removed from actual policy. If reliable poll aggregation can be used to debunk and quiesce irresponsible chatter about the horse race maybe the pols will re-focus on policy differences.
I think part of the problem is that responsible poll aggregators have been transparent in their partisan affiliations — Nate, Sam Wang, HuffPo etc. Because of this, “mainstrain media” types have gravitated toward Real Clear Politics, which somehow appears more neutral …. and Nate, the popular guy from , gulp, the NY Times. Naturally the Right will beat up on Nate. Let’s not go there right now.
To summarize my points.
– Accurate poll aggregation is, in and of itself very valuable, particularly in helping us sort through the blitz of polling data presented every day.
– I personally don’t require election prediction.
– Poll cherry picking is still the dominant way in which polls are reported in the main stream media.
– Were poll aggregation to be legitimatized and used in mainstream poll reports, it might put pressure on candidates to move poll numbers through policy rather than spinning the horse race or spawning biased pollsters.
– Poll aggregators have work to do before they are seen as credible sources by mainstream media types for sorting out incoming polling data.
I would like to see poll aggregation become credible and move into mainstream reporting.
Paul Collacchi