71% is not 100%. Statistics is not magic. Stop trying to burn Nate Silver at the stake.
In the wake of the 2016 U.S. Presidential election, journalists left, right, and center lamented what they described as the failed predictions of election forecasters like FiveThirtyEight and the New York Times’s The Upshot. Nate Silver, who founded FiveThirtyEight following his perfect forecast of the 2012 election and near-perfect forecast of the 2008 election, bore the brunt of these criticisms. Newsweek chastised him for defending his “erroneous prediction” of a Clinton win. Michael Braun, in a Medium article, claimed that Nate got the election “spectacularly wrong” and inferred a bleak future for FiveThirtyEight. Even The Economist got in on the fray, albeit in their notoriously even-handed manner.
The biggest problem with these analyses is their most basic premise: that Nate Silver made a prediction. Any forecaster worth their salt is careful to refer to their output as forecasts, and Nate is no exception. Forecasts are inherently uncertain, and that uncertainty manifests itself in the probability distribution reported by each of these data journalists, including FiveThirtyEight. If you scroll down on the website’s final 2016 forecast, you find the following chart:
When laypeople—and even statisticians!—think about one-off events like elections, we are tempted to view them as deterministic: they happen once, and the result seems predictable in hindsight. The idea behind probability distributions like the two above is that, if the event (i.e., the election) were to happen an infinite number of times, then each outcome (in this case, the electoral vote count) would occur a certain percentage of the time, with each outcome’s percentage represented by the height of each bar on the distribution.
In real life, we only witness a single draw from this probability distribution. Critics who pan forecasters’ results as “mispredictions” implicitly operate under the assumption that the true distribution of outcomes is a single bar with a height of 100%. That isn’t an assumption that any forecaster makes or would ever choose to make. It’s an assumption that lies outside of the epistemology of statistics.
For events that are truly random, such as the flip of the coin, the distribution of outcomes is unchanging and can be exactly determined. In election modeling, the distribution is instead an inference of the probability of certain outcomes based on a combination of polling data and past election results. Under perfect methodological practices, the inferred distribution converges to the “true” distribution as the sample size increases. The exact and estimated distributions operate in essentially the same way. If I told you that you have a 70% chance of rolling less than an eight on a ten-sided die and you rolled a nine, you wouldn’t pen an article saying that I got the die roll spectacularly wrong. Nate silver told us that there was a 71% chance of a Clinton Presidency, and we rolled a nine.
FiveThirtyEight Was Relatively Bearish on a Clinton Win
There is a case to be made that some election forecasters, such as The Upshot, called the election incorrectly, even within the bounds of statistical uncertainty. Many forecasters other than FiveThirtyEight had the probability of a Clinton win above 90%, where Nate Silver put Barack Obama’s odds at re-election in 2012, an outcome on which he was willing to bet $1,000.
Nate Silver and FiveThirtyEight were not confident about Hillary Clinton’s odds, even when their model had her favored to win all swing states plus Arizona. Throughout the race, Nate insisted that, though the probability of a Clinton win roughly mirrored Obama’s 2012 odds, the calculations behind the probability told two different stories: Obama held a small but stable lead over Romney, while Clinton held a large but volatile lead over Trump. In a race where a zero-percent-plus-one vote margin is all that you need to win every one of a state’s electoral votes, stability is more assuring than magnitude.
Nate was also one of the few members of the media that specifically and confidently singled out the Midwest as a potential opportunity for Donald Trump, and criticized legacy media’s dismissal of Trump’s campaigns in Michigan and Wisconsin. As early as 2015, Nate penned an article titled, “There Is No Blue Wall,” and on November 1st, he outlined all of Trump’s paths to victory under a 2 point or less popular vote margin in an article titled “Yes, Donald Trump Has a Path to Victory.”
On the eve of the 2016 election, FiveThirtyEight released a final, pre-election podcast that discussed their model’s final results. When the host asked Nate about Pennsylvania and the model’s 75% odds of a Clinton win, he answered with what seems now to be an almost perfect call:
“My theory is that it’s more plausible for Trump to win Pennsylvania and some other rust belt states like Michigan than people infer, and [they make] these kind of sophomoric arguments like, ‘Oh, well, she’s been ahead in all the polls.’ But that doesn’t really matter much if you happen to be one point ahead or two points ahead and it’s close. And you understand the scenario by which she loses a state like Michigan or Pennsylvania is, you know, those white working class voters, those union workers go further to Trump, and she doesn’t turn out her African-American base. If both of those things happen — so you’re flipping a coin twice, 25 percent chance — then I think she could be in jeopardy in one of those states. People are a little smug about saying, ‘how dare Trump campaign in Michigan, that shows how desperate he is.’”
Given the final outcome of the election, the model’s call, and Nate’s explanation of it, was quite accurate. There was a swing toward Trump in the last week of the election, following the Comey letter, and nearly all voting in Pennsylvania occurs on the day of the election. Nate insisted that a 75% probability isn’t enough to withstand two adverse events, and it wasn’t. Would you be surprised if you flipped a coin twice and got heads each time?
Blame the Pollsters—But Not Too Much
I am clearly partial to FiveThirtyEight’s coverage. However, there are reasonable criticisms to be levied. Nevada, Colorado, and New Hampshire were all forecasted to swing for Trump before Michigan and Wisconsin, yet they went for Clinton (in Colorado, by five points!). Meanwhile, Wisconsin was forecasted to swing for Clinton by five points.
This is less of an indictment of FiveThirtyEight’s methodology than of the distribution and methodology of polling companies. Contrary to the belief held by certain unnamed individuals, FiveThirtyEight does not conduct any of their own polls; they simply aggregate the polls taken by other organizations. Forecasters use models that are constrained by the frequency and quality of polls that independent companies choose to conduct.
The quality of polls overall has declined significantly over the past twenty years. According to a study by Pew, response rates in telephone surveys declined from 36% in 1997 to just 9% in 2016. Pollsters remedy the limited and less random samples they now have by demographically weighting their respondents according to a model of the electorate. However, demographics are merely estimated in non-census years (which is most of them), and the demographic composition of voter turnout is highly fluid. Many pollsters get their models wrong or systemically over-estimate the odds of one party. FiveThirtyEight remedies the variability in sample size and poll quality by weighting historically reliable pollsters more heavily than unreliable ones and adjusting outliers to concurrent trends, but reliable pollsters are scarce and subject to the same constraints as unreliable ones.
Polling issues faced by forecasters are quantitative as well as qualitative: Even a poll conducted with a 100% response rate using a perfectly random sample will have larger errors if its sample is small. A sample of 300 respondents has a wider error than one with 600 or 1,200 respondents. Poll aggregation essentially pools the samples of each poll into one large sample, so states with fewer polls conducted would have weaker forecasts. Colorado and Ohio were polled more than Michigan and Wisconsin; unsurprisingly, the latter two had larger errors.
Flawed but Critical
This may seem like a frivolous hill to fight and die on. I am not a data journalist myself, and it should not matter to me whether other people take the same things seriously that I do. But the anti-intellectual backlash to polling has costs. Since the election, Trump administration officials and surrogates have repeatedly dismissed negative approval ratings, citing election poll inaccuracy (even though national polls were fairly accurate). During the U.K.’s 2017 election, media outlets cited polling errors in the U.S. and U.K. as a justification for cutting their polling budgets and “talking directly to voters.”
It would be a step in the direction of good journalism to move resources away from horse-race polling and into high-quality field efforts. It is not good journalism to take resources away from horse-race polling and devote them to horse-race election coverage. A trend in national election coverage is to interview prototypical voters in certain areas that confirm conventional narratives of that election. Take for example, the New York Times’s articles from March 12th and June 2nd, in which they interview the same Ohio small business owner about Donald Trump and conclude that the President has not lost his core constituency.
This sort of journalism is not an alternative to polling. It is a crumby version of polling, which uses a miniscule sample and an array of narrowed-down survey questions. It does not serve to inform readers about the diversity of views in a broad set of regions, but perpetuates pre-existing narratives. On the subject of coal miners, reporters travel to towns in Kentucky and West Virginia, and they ask leading questions about jobs, often failing to provide the big picture to their readers.
Polling is imperfect and necessarily fails to capture the nuance that longform reporting provides, but it informs the public about the prevalence of certain views. Without polling, we would not know that a majority of Trump voters cited voting against Clinton as their primary motivator, or that anti-immigrant beliefs are a global phenomenon. We would not know that the Congressional Republicans’ current health care proposal is unpopular with almost every demographic group.
This polling data provides journalists and policy-makers with an idea of where to direct their attention and resources. With a political and media machine that is focused in an isolated region that contains a small proportion of the population, far from the two most populous states in the country, that polling data is necessary. It is not gospel, and it should not stand alone, but without it, would we really gain more than we lose?
The efficacy of a forecaster’s model does not depend on her binary predictive record. A poll aggregator who correctly forecasts the winner with 50.1% or greater chance of winning every single time has a worse model than somebody who misses one or two but predicts strong winners with greater precision, as Nate did in 2008 and 2012. That would be true even if polling methodology were perfect. Data journalism is in its infancy and has a bright future. The state of conventional journalism, on the other hand, is up for debate.
 I edited out a paragraph about coal economics and politics in the United States. If you are interested in this subject, please contact me. I am considering it for a future post.