This post is not about politics, but the failure of statistics applied to politics. Our technology, the science behind big data, Bayesian statistics, and ultimately their incarnation in Nate Silver’s model, all crumbled down last night. Sure, the FiveThrityEight blog keeps saying that there was a high level of uncertainty associated with the polls, mostly due to a large fraction of non-voters. But hey, the NYTimes estimated the chances of Hillary Clinton to win the elections larger than 80%. It just did not happen.
Let’s try to say this more clearly. At the core of these
statistical analysis methods is the Bayes’ theorem, which states that the probability of a model being true given the observed data is proportional to the likelihood of the data being in agreement with the model times our prior belief in the validity of the model (see the nice example borrowed from XKCD).
How does this translate to what happened yesterday in the US? There are three components we have to look at:
- The statistical model which should be agnostic about the candidates (also known as likelihood). It should not be agnostic, though, about the correlations among different Counties and how they voted in the past, their ethnic composition, household income rate, education level, etc.. These facts are a given of the problem, and should be treated as objective.
- The data, e.g. telephone surveys made before the actual voting procedure begins. This is one of the most critical parts, since it’s well known that people lie, change their mind, or are utterly ashamed to declare their intention. The aforementioned model usually takes into account these effects, but are usually quite difficult to estimate.
- The prior belief, which is the most obvious place where things may fail badly. For example, the a-priori probabilities can be assigned based on previous elections. But a few years may have passed since then, and things might have changed dramatically in the meanwhile. Even worse, everybody has his/her own priors (as Sean Carrol said “everyone is entitled to have their own prior, but not their own likelihoods”) which have to be spelled out. In some cases they may be so engrained in the mind of the statisticians that it might not be clear what’s wrong and why. Usually, one can try using different priors, but this is not always possible or even meaningful.
This said, all the above is implemented in extremely complicated computer programs that crunch numbers by running on incredibly powerful data centres. I was amazed by seeing the chances changing in real-time last night as real votes came in from each County. I do believe this is one of the pinnacles of human ingenuity. And this can be the main trouble.
So to me the most scariest thing of all is not that priors were not good enough, of that data were not taken objectively (e.g. by asking questions in a certain way, etc.). No: what I’m worried about is that the method itself may be fundamentally flawed. What I’m referring to in particular is Michael Moore’s piece that appeared on the HuffingtonPost in July (5 reasons why Trump will win): Moore has no formal higher education in science, but thanks to his intuition and a common sense of politics has managed to predict the result with a much higher degree of accuracy than Nate Silver and the New York Times. In his words:
I believe Trump is going to focus much of his attention on the four blue states in the rustbelt of the upper Great Lakes – Michigan, Ohio, Pennsylvania and Wisconsin. From Green Bay to Pittsburgh, this, my friends, is the middle of England – broken, depressed, struggling, the smokestacks strewn across the countryside with the carcass of what we use to call the Middle Class. Angry, embittered working (and nonworking) people who were lied to by the trickle-down of Reagan and abandoned by Democrats who still try to talk a good line but are really just looking forward to rub one out with a lobbyist from Goldman Sachs who’ll write them nice big check before leaving the room. What happened in the UK with Brexit is going to happen here.
In the end, what scared me last night is not that the President-elect of the richest Country in the world does not believe in climate change and vaccines. It’s that our ability to use reason (call it logic, mathematics, statistics) to create a model of the environment we’re in and make predictions about it may never be completely satisfactory no matter how much data we gather and no matter how many paper we publish. But that’s where our technology is leading us, just think of self-driving cars or medical diagnostic tools.