In his article on last week’s forecast, John Rentoul wrote:
“Probability is hard enough to understand anyway, of course. Look at Nate Silver, the guru of American election predictions. He said Brazil had a 65 per cent chance of winning against Germany in the World Cup semi-final. Well, you could say that their 7-1 defeat fell in the other 35 per cent but – after the event – we can be pretty confident that the 65 per cent figure meant little useful.”
There are various issues here in the context of the overall article. Did the 65% figure mean little useful? How should we judge probabilistic forecasts after the event they were trying to predict? Even if John Rentoul’s interpretation here is right, should the poor performance of a football match prediction undermine the credibility of other forecasting exercises of very different kinds of events for the forecaster, or even for all forecasters?
Without knowing anything particular about predicting football results, the 65% figure remains meaningful if it was derived from a forecasting method that is generally meaningful. Nate Silver argues that the result was statistically the most shocking in World Cup history. Even if it wasn’t the most extreme event, it certainly was extreme. We should not reject out of hand a forecasting model that fails to predict an extreme event (they are just too hard to predict)
I will return to the issue of extreme events below, but there is also a more general point about the win/lose probabilities. The industry standard in science is that hypotheses are rejected if there is less than a 5% chance of the data occurring as they did if they hypothesis were true. By analogy, we can only really reject a forecast if it said there was a less than 5% chance of some event that actually occurred. On this basis, a win by a team predicted to have a 35% chance is not sufficient on its own to reject the prediction model, only to query it.
This implies that the model would have to predict a 95% chance of something for its failure to happen to be statistically significant. Since it is unlikely that my model will ever give any party such a strong chance, by this standard I’ll never be wrong.
How convenient! But I’m not saying that forecasters who do not make such strong predictions are always right, I’m saying that any single event typically says very little about the quality of a probabilistic forecast.
Forecasting methods can and should be judged by their ability to produce well-calibrated probabilities for lots of events. By well-calibrated I mean that if a forecaster made 100 predictions with one side having a 65% chance of winning then roughly 65 times out of the 100 that side would win.
I don’t know whether football forecasters meet this criterion or not. There aren’t enough British general elections to judge forecasters of British elections in this way. Again: very convenient! (But I can happily note that the probabilistic method of seat prediction I’ve borrowed from the GB exit poll methodology has been shown to produce pretty well-calibrated probabilities of individual seat outcomes.)
Even if you think that the 7-1 German victory over Brazil was so extreme it does make a mockery of Nate Silver’s forecast, it does not follow that his or anyone else’s probabilistic election forecasts are dubious or problematic. They should be judged on their own merits.
Nonetheless there is an important reminder here for election forecasters and their consumers. Elections, as well as sport, have their bounds but are still susceptible to the possibility of extraordinarily-low-probability extreme events. Nassim Taleb famously referred to them as ‘black swan’ events because people start to assume they are impossible because they have not happened for an extremely long time or ever.
The paragraph that follows the one quoted above reads as follows.
“And that is before you get to the politics. The most important things that will happen between now and election day are not which statistician can best tweak their model, but the politics, politics, and politics.”
Politics and events can change things, and so they should. But part of the point of my and other forecasting methods is to estimate the uncertainty in the central forecasts and that uncertainty is reflected in the prediction intervals and then in the probabilities.
The uncertainty estimates are only based on polls and elections since 1950, and mostly on the data since 1974. That is pretty narrow in the big scheme of things. But still there are plenty of people who seem to think that the prediction intervals are too large. I don’t think there has been anyone who has told me they are too small.
Following Nassim Taleb’s argument, the prediction intervals are more likely to be too small than too big because of the narrow range of historical data.
The 2008 financial crisis was in part due to poor estimation of the risk associated with financial instruments because of limited historical price data. It is not so hard to imagine a variety of extreme political events that might lead to election results outside the range estimable from the 1950-2010 experience.
There are easily identifiable features of the current electoral cycle that are unprecedented and so make the consequences for the 2015 election difficult to predict. The extent of Liberal Democrat to Labour switching in response to coalition government and the rise of UKIP are the obvious and most important ones. But since we already know something about these phenomenon they are not the kind of extreme unprecedented and unanticipated event that we need to be mindful of.
We should not give up on probabilistic forecasting because of this problem though.
Giving up on probability typically leads people to one of two positions: feigned ignorance or artificial certainty. It is unreasonable to suggest we know absolutely nothing about the relative chances of different outcomes of the next election. But I think that is a preferable position to claiming to be sure that some particular outcome will occur. Similarly, we should not rule out or even describe as remote possibilities relatively common events (like either a Tory or Labour majority) so far from the election.
Rather than giving up on probabilistic forecasting, we should instead take any forecast probabilities as indicative and be mindful of the limitations of forecasting methods.
This implies that forecasting probabilities that are relatively evenly split over different events, while less exciting, are a priori more plausible than more extreme probabilities. But then I would say that, wouldn’t I?
Acknowledgements
I should say that, despite this being the fourth and final lengthy post taking issue with John Rentoul’s article, I do actually think it was an excellent article. I wasn’t just being polite and I very much appreciate him engaging with the forecast seriously.
Stephen, in your model do you use the final election result or the final polls before the election?
The reason why this is important is that contemporary polls take more account of the over-estimation of the Labour support (vice versa re Tory support) in the raw figures.
Polling organistaions started to make this change in their methodology after the marked discrepancy noted in 1992.
This would introduce a significant skew if comparing to previous polls ie any ‘swing back’ would be much lower.