Gianluca Baio's blog: May 2017

Tuesday 30 May 2017

The swingers

Kaleb has left a comment on a previous post, asking what constituencies my model predicted to change hands, with respect to the 2015 election. This is not too difficult to do, given the wealth of results and quantities that can be computed, once the posterior distributions are estimated.

Basically, what I have done is to compute, based on the "possible futures" simulated by the model, the probability that the parties win each of the 632 seats in England, Wales and Scotland. Many of them seem to be very safe seats $-$ I think this is consistent with current political knowledge, although in an election like this possibly more can change...

Anyway, using the very latest analysis (as of today, 30th May and based on all polls published so far, but discounting older ones), there are 39 seats that are predicted to change hands. The following graph shows the predicted distribution of the probability of winning each of those seats, together of an indication of who won in 2015.

Of course, Labour are the big losers (there are many of the 39 constituencies that were Labour in 2015, but are predicted to swing to some other party in 9 days time). Conversely, the Tories are the big winners and most often, when they do, they are predicted to win that seat with a very large probability. There aren't very many real 50:50s $-$ a couple, I'd say, where the results are predicted to be rather uncertain.

Incidentally, as of today, this is the distribution of seats predicted by the model.
mean sd 2.5% median 97.5%
Conservative 359.467 5.4492757 351 358 371
Labour 209.276 5.3613961 198 211 218
UKIP 0.000 0.0000000 0 0 0
Lib Dem 14.699 2.1621920 10 15 19
SNP 48.055 2.7271620 42 48 52
Green 0.000 0.0000000 0 0 0
PCY 0.503 0.8286602 0 0 3
Other 0.000 0.0000000 0 0 0

Labour are continuing to close the gap on the Tories, but are still a long way out. I'm curious to see what last night not-a-debate did to the polls...

Friday 26 May 2017

(Too) slowly but surely?

After the tragic events in Manchester and the suspension in the campaigns, things have started again and a couple new polls have been released. Some of the media have also picked up the trend I was observing from my model and so I have re-updated the results.

The increasing trend for Labour does see another little surge, as does the decreasing trend for the Tories. In comparison to my last update, the Lib Dem are slightly picking up again. But all in all, the numbers still tell kind of the same story, I guess.

mean sd 2.5% median 97.5%
Conservative 369.251 5.1765622 357 370 378.000
Labour 197.886 5.2142298 190 197 211.000
UKIP 0.000 0.0000000 0 0 0.000
Lib Dem 15.085 2.3852598 11 15 19.025
SNP 49.263 2.3965756 44 49 53.000
Green 0.000 0.0000000 0 0 0.000
PCY 0.515 0.8499985 0 0 3.000
Other 0.000 0.0000000 0 0 0.000

These are the summary results as of today (again after discounting past polls). Lib Dem move from a median number of expected seats of 14 to the current estimate of 15; Labour go from 191 to 197 and the Tories go from 376 to 370, still comfortably in the lead.

Monday 22 May 2017

Quick update

This is going to be a very short post. I've been again following the latest polls and have updated my election forecast model $-$ nothing has changed in the general structure, only new data coming as the campaigns evolve.

The dynamic forecast (which considers for each day from 1 to 22 May only the polls available up to that point) show an interesting progression for Labour, who seem to be picking up some more seats. They are still a long way from the Tories, who are slightly declining. Also, the Lib Dems are also going down and the latest results seem to suggest a poor result for Plaid Cymru in Wales too (the model was forecasting up to 4 seats before, where now they are expected to get 0).

The detailed summary as of today is as follows.
mean sd 2.5% median 97.5%
Conservative 375.109 4.02010949 367.000 376 382
Labour 192.134 3.94862452 186.000 191 200
UKIP 0.000 0.00000000 0.000 0 0
Lib Dem 14.320 2.24781064 10.000 14 18
SNP 50.053 2.12713792 45.975 50 53
Green 0.007 0.08341438 0.000 0 0
PCY 0.377 0.77036645 0.000 0 3
Other 0.000 0.00000000 0.000 0 0

I think the trend seems genuine $-$ Labour go from a median number of predicted seats of 175 at 1st May to the current estimate of 191, the Tories go from 381 to 376 and the Lib Dems from 23 to 14. Probably not enough time to change things substantially (bar some spectacular faux pas from the Tories, I think), though...

I've also played around with the issue of coalitions $-$ there's still some speculation in the media that the "Progressives" (Labour, Lib Dems and Greens) could try and help each other by not fielding a candidate and support one of the other parties in selected constituencies, so as to maximise the chance of ousting the Conservatives. I've simply used the model prediction and (most likely unrealistically!) assumed 100% compliance from the voters, so that the coalition would get the sum of the votes originally predicted for each of the constituent parties. Here's the result.

The Progressive come much closer and the probability of an outright Tory majority is now much smaller, but still...

Monday 15 May 2017

Through time & space

I've continued to fill in the data from the polls and re-run the model for the next UK general election. I think the dynamic element is interesting in principle, mainly because of how the data from the most recent polls could be weighed differently than those further in the past.

Roberto had done an amazing job, building on Linzer's work and using a rather complex model to account for the fact that the polls are temporally correlated and, as you get closer to election day, the historical data are much less informative. This time, I have done something much simpler and somewhat more arbitrary, simply based on discounting the polls depending on how distant they are from "today".

This is the results given by my model in the period from May 1st to May 12th $-$ at every day, I've only included the polls available at that time and discounted using a 10% rate, assuming modern life really runs very fast (which it reasonably does...). Not much is really changing and the predictions in terms of the number of seats won by the parties in England, Wales and Scotland seems fairly stable $-$ Labour is probably gaining a couple of seats, but the story is basically unchanged.

The other interesting thing (which I had done here and here too) is to analyse the predicted geographical distribution of the votes/seats. Now, however, I'm taking full advantage of the probabilistic nature of the model and not only am I plotting on the map the "most likely outcome" (assigning a colour to each constituency, depending on who's predicted to win it). In the graph below, I've also computed the probability that the party most likely to win a given seat actually does so (based on the simulations from the posterior distributions of the vote shares, as explained here) $-$ I've shaded the colours so that lighter constituencies are more uncertain (i.e. the win may be more marginal).

There aren't very many marginal seats (according to the model) and most of the times, the chance of a party winning a constituency exceeds 0.6 (which is fairly high, as it would mean a swing of over 10% from the prediction to overturn this).

This is also the split across different regions $-$ again, not many open battlefields, I think. In London, Hornsey and Wood Green is predicted to go Labour but with a probability of only 54%, while Tooting is predicted to go Tory (with a chance of 58%).

Friday 5 May 2017

Flash forward sampling

Slowly but surely, I've managed to think a bit more about the elections model. Here, I've described how I included some prior information in my model to try and "discount" the evidence provided by the polls, to obtain estimates that may be more reasonable and less affected by the short-term shocks that may (over)influence people's opinions.

However, I wasn't entirely happy with the strategy I had used $-$ the informative priors I had set on the parameters $\alpha_p$ and $\beta_p$ did induce rather precise distributions. In addition, the analysis I have made wasn't making the most of the actual inferential machine I had constructed, because it was estimating the number of seats for the average vote shares profile. But in fact, I can do better than that and actually propagate fully the uncertainty in the vote shares and have an entire posterior distribution of the seats configuration.

So, first off, I think I've refined my priors and I did so by running the model simply through "forward sampling" $-$ in other words, by not including any of the polls in my analysis to better understand what implications were deriving by my choice of priors. By selecting the means and standard deviations for the vectors $\alpha$ and $\beta$, I effectively imply the following prior expectation in terms of the vote share.

The red dots represent the "historical" averages over the past 3 general elections, which I used as a reference point. You could fiddle a bit more with the parameters of the distributions for $\alpha_p$ and $\beta_p$, but I am reasonably happy with the implications of the current choice $-$ I'm expecting the Conservatives to do much better than the historical figure; Labour is expected to be around how they normally do, but there is a chance they'll do worse than "usual" and on average they're also doing worse than in the 2015 election. The Lib Dems are predicted with relatively large uncertainty and still under their historical average $-$ I think this is reasonable and many pundits are also aligned with this. Similarly, the prior effectively gives a very low weight to UKIP $-$ and this is in line with general consensus (I think) as well as the result of last night local elections.

Interestingly, I can map these results and propagate the uncertainty to estimate the distribution of seats in Parliament (still with no data from the polls included), to produce the following graph.

Again, I think this picture is even more convincing than the analysis of the probabilities and I feel relatively confident with this. (But of course, one could replicate the whole analysis and try different specifications, which I have to some degree).

So it's now time to include the data that are pouring in from the polls. In particular, I now have information collected over the past two weeks or so and I think in a fast-moving election such as this where opinions may be changed by a large number of "facts" and stories, it's useful to "discount" the older data. There are many ways of doing this, more or less formally $-$ I'm using a rather quick and dirty strategy, by applying a simple discount rate defined as a function of time since today.

Each observed poll gets rescaled as $$y^{j*}_{ip}= \frac{y^j_{ip}}{(1+\delta)^t}, $$
where $ y^{j*}_{ip}$ is the number of voting intentions for party $p$ in poll $i$ under voters of type $j$ (=1 for Leavers and =2 for Remainers); and $\delta$ is an arbitrarily defined discount rate. I've tested a few versions (ranging from 0.03 to 0.1) and the results do not vary dramatically $-$ the larger the discount rate, the more older polls are discounted, which tends to reduce by a minimum of 1 and a maximum of 4 the number of seats associated with the Conservatives. This is because in the very first few polls, the advantage associated with the Tories was bigger than in the most recent).

With a discount rate $\delta=0.1$, the results estimated in terms of seats won are as in the following graph.

So, Conservatives with a median estimated number of seats of 379 (and a 95% interval estimate of 369-391, way above the line of 325 seats that are needed for a majority), Labour with 175 (163-185), Lib Dems with 25 (17-31), SNP with 49 (46-54), Green with 1 and Plaid Cymru with 3 (0-4).

I think this analysis is interesting because it is fairly easy to assess the uncertainty propagated through the model up to the actual quantity of interest (the seats won). Other pundits are being a lot less favourable to the Lib Dems, but I'm kind of happy of how my model works, especially after considering the prior analysis.

Plenty more fun to come $-$ well, depending on your definition of fun...

Gianluca Baio's blog