Who are you?
The team consists of David Shor, Harry Enten, and Rasmus Pianowski. David is a Math student currently visiting Princeton University as a Visiting Graduate Student. Harry is an undergraduate at Dartmouth and an intern at Pollster.com. Rasmus is a freshman at University of Hamburg, he has done political consulting and Media Outreach work for Montana congressional candidate Tyler Gernant.
The site is closely affiliated to Professor Wang's Princeton Election Consortium.
How often do you update?
Roughly every three days. As the election campaign heats up, we'll scale to daily updates
Where do you get your polls?
Our polls are gathered by Harry Enten, a member of our team and an intern at Pollster.com. Our polls primarily come from Pollster.com, PollingReport, the National Journal's proprietary database, and a variety of local news sources. Our House, Senate, Governor, and Generic poll database can be accessed in Google-Doc format here.
Do you weigh polls by recentness?
Yes, but in a very different way then other political forecasters. Others, namely FiveThirtyEight, assign every poll a weight based on how old the poll is, and then take a weighted average of polls.
This is intuitive, but it's inconsistent and throws a lot of data away. To see the problem with weight discounting of old polls, look at two potential horse-races over a week:
Not all weeks in politics are the same. If there is no scandal, then a poll taken in the beginning of the week is still pretty informative of public opinion today. Not much has changed, so it should have a high weight. But if there was a scandal, then a poll from the beginning of the week is nearly worthless. It should have a low weight. In order to appropriately weigh the poll, we have to consider every other poll at the same time.
To get around this problem, our model estimates public opinion for every single day of the campaign, not just today. A poll only provides information only about opinion on the day it was taken. How much this effects estimates for today depends on estimates of the proceeding days. Incidentally, this has been the preferred approach in signal processing and engineering for 60 years.
Do you exclude any polls?
Yes. We exclude Zogby Interactive internet polls, because there is quite a bit of evidence that their polls are garbage. We also exclude polls conducted by Research 2000, as it now seems likely that a good number of their polls were fabricated. Otherwise, we include all other polls.
Do you weigh polls by pollster accuracy (PIE)?
Estimated PIE by pollster with error bars by number of polls. Zogby Interactive is excluded
It is true that polls are generally more volatile** then would be expected purely by sampling error, and our model accounts for this, but we have not been able to find any evidence that "accuracy" varies significantly between pollsters *.
*-Except for Zogby, whose polls experience roughly 5 times as much variability as would be expected by sampling error.
**- see Design effects.
Do you account for House Effects or Potential Pollster Bias?
Yes. This is addressed in more detail later in the FAQ.
How do your House Forecasts work?
Our house forecasts work by incorporating information from District-level house polls, Generic Ballot polls, and a simple Regression Model that looks at district-characteristics.
Our Regression Model provides estimates for individual districts based on a model fitted to previous elections. More specifically, two regression models are fitted, one for races where the incumbent is running for re-election, and another for open races. A national swing derived from the Generic Ballot is then applied to account for the change in national conditions.
For Open races, the regression takes into account Cook Ratings, PVI, and District Income (See Nate Silver's post).
For races with Incumbents, regression takes into account races PVI, the Democrat's vote total in the previous House Election in 2008, and Cook Ratings.
Together, these two models have an R^2 of .94*, that is, they account for 94% of the variation in the previous election.
This is similar to, and was inspired by, the methodology successfully used by Bafumi and Wlezien in 2006 and 2008.
*- The Model is estimated with Bayesian Regression, instead of the more traditional OLS. Bayesian Regression allows for variables to be inputed as probability distributions instead of numbers. This is done mainly to account for within-race estimate correlations and to better handle unopposed races.
Cook Ratings? Isn't that cheating?
In a sense, yes. But Cook has access to a lot of information that we don't: Internal polling, challenger quality, fundraising numbers, staffing information, and inside-the-beltway gossip. And careful analysis of the previous elections has found that, all else being equal, Cook Ratings provide additional information: On average, a one point change in Cook's ratings is indicative of a roughly 3 point drop.
Why just Cook? What about Rothenburg, Sabato, etc?
These ratings are all highly correlated, so introducing them all introduces a multicolinearity problem. There are only 30-40 open races per cycle, so we do our best to keep the number of variables in our model low.
Why use Regressions at all?
Most districts, even many competitive ones, have not been polled at all. Some methodology is needed to impute missing values for them. Even the districts that have been polled usually only have one or two polls. Worse, these polls are often out-of-date, have extremely small sample sizes, and are by small unknown pollsters.
Polls simply do not provide enough information to produce reliable house estimates. Regressions allow us to impose educated guesses as priors. The more polls a district get, the less the model relies on regressions.
What are House Effects?
House Effects are the tendency for certain pollster's results to learn toward one party or another. This can happen because of differences in question wording, turnout assumptions, or sampling methods. Failure to take house effects into account make Horse-races seem much more volatile then they actually are. Taking House Effects into account prevents any individual pollster from distorting results.
One important caveat is that House Effects can only tell us where pollster's results lie relative to each other. They do not make any statements as to who is "right".
Our model estimates house effects in a fashion similar to what has been done by Simon Jackman of Stanford.
What about Likely-Voter polls vs Registered Voter Polls?
Likely and Registered Voter polls by a given pollster are assigned separate house effects. We "center" our model by assuming that, on average, the median LV poll is unbiased.
What about Rasmussen?
Rasmussen is, by far, the most prolific pollster this cycle. They make up more then half of the polls in our Senate Database. Unfortunately, they have a moderately large pro-republican house effect:
House effects with error bars of the six largest Pollsters in our Senate Database .
Due to Rasmussen's prolific polling, they would dominate our estimates if it were not for our house-effect adjustments. Luckily, our methodology should largely account for this.
How do you know Rasmussen isn't right?
I don't, and the model does allow some possibility that their vision of the electorate is correct. But considering that 33 out of 48 LV pollsters in our database have a more Democratic lean then them, it wouldn't be fair to assign that the most likely outcome.