Tuesday, 21 November 2017


Recently, I've been doing a lot of work on the beta version of BCEA (I was after all born in Agrigento $-$ in the picture to the left $-$, which is a Greek city, so a beta version sounds about right...). 

The new version is only available as a beta-release from our GitHub repository - usual ways to install it are through the devtools package.

There aren't very many changes from the current CRAN version, although the one thing I did change is kind of big. In fact, I've embedded the web-app functionalities within the package. So, it is now possible to launch the web-app from the current R session using the new function BCEAweb. This takes as arguments three inputs: a matrix e containing $S$ simulations for the measures of effectiveness computed for the $T$ interventions; a matrix c containing the simulations for the measures of costs; and a data frame or matrix containing simulations for the model parameters. 

In fact, none of the inputs is required and the user can actually launch an empty web-app, in which the inputs can be uploaded, say, from a spreadsheet (there are in fact other formats available).

I think the web-app facility is not necessary when you've gone through the trouble of actually installing the R package and you're obviously using it from R. But it's helpful, nonetheless, for example in terms of producing some standard output (perhaps even more than the actual package $-$ which I think is more flexible) and of reporting, with the cool facility based on pandoc.

This means there are a few more packages "suggested" on installation and potentially a longer compilation time for the package $-$ but nothing major. The new version is under testing but I may be able to release it on CRAN soon-ish... And there are other cool things we're playing around (the links here give all the details!).

Monday, 20 November 2017

La lotteria dei rigori

Seems like my own country has kind of run out of luck... First we fail to qualify for the World Cup, then lose the right to host the relocated headquarters of the European Medicine Agency, post Brexit. If I were a cynic ex-pat, I'd probably think that the former will be felt like the worst defeat across Italy. May be it will.

As I've mentioned here, I'd been talking to Politico, about how the whole process looked like the Eurovision. I think the actual thing did have some elements $-$ earlier today, on the eve of the vote, it appeared like Bratislava was the hot favourite. This kind of reminded me of the days before the final of the Eurovision, when one of the acts is often touted as the sure-thing, often over and above its musical quality. And I do believe that there's an element of "letting people know that we're up for hosting the next one" going on to pimp up the experts' opinions. Although sometimes, as it turns out, the favourites are not so keen in reality $-$ cue their poor performance come the actual thing...

In the event, Bratislava was eliminated at the first round. The contest went all the way to extra times, with Copenhagen dropping out at the semifinals and Amsterdam-Milan contesting the final head-to-head. As the two finalists got the same number of votes (with I think one abstaining), the decision was made on luck $-$ basically on penalties, or as we say in Italian, la lotteria dei rigori.

I guess there must have been some thinking behind the set-up of the voting system that, in case it came down to a tie at the final round, both remaining candidates would be "acceptable" (if not to everybody, at least to the main players) and so they'd be happy for this to go 50:50. And so Amsterdam it is!

Tuesday, 14 November 2017

Relocation, relocation, relocation

Earlier today, I was contacted by Politico $-$ they are covering the story about the European Union's process to reassign the two EU agencies currently located in London, the European Medicines Agency, (EMA) and the European Banking Authority (EBA) post-Brexit.

I know of this, but wasn't aware of the actual process, which is kind of complex: 
"The vote for each agency will consist of successive voting rounds, with the votes cast by secret ballot. In the first round, each member state will have one vote consisting of six voting points, which should be allocated in order of preference to three offers: three points to the first preference, two to the second and one to the third. If one offer receives three voting points from at least 14 member states, this will be considered the selected offer. Otherwise, the three offers (or more in case of a tie) with the highest number of points will go to a second round of voting. In the second round, each member state will have one voting point, which should be allocated to its preferred option in that round. If one offer receives 14 or more votes, it will be considered the selected offer. Otherwise, a third round will follow among the two offers (or more in case of a tie) with the highest number of votes, again with one voting point per member state. In the event of a tie, the presidency will draw lots between the tied offers."
Cat Contiguglia has contacted me to have a chat about this $-$ they had done a couple of pieces likening the resemblance with the Eurovision contest. As I told Cat, however, I think this is more like the way cities get assigned the right to host the Olympic Games, or even how the Palio di Siena works... I guess lots of discussion is already going on among the member states. 

Apparently, Milan and Frankfurt are the favourites to host EMA and EBA, respectively. I think I once heard a story that, originally, EMA was supposed to be located in Rome. Unfortunately, the decision was to be made just as one of the many Italian political scandal was about to uncover, pointing to massive corruption in the Italian healthcare system and so Rome was stripped of the title. Perhaps a win for Milan will help Italy get over the World Cup...

Friday, 10 November 2017

At the Oscars!

Well, these days being part of the glittering world of show-biz is not necessarily a good thing, but when your life is soooo glamorous that someone feels the unstoppable need to make a biopic of it... well, you really need to embrace your new status as a movie star and enjoy all the perks that life will now throw at you...

I know, I know... This is still about the Eurovision. But, this time they made a short video to tell the story $-$ you may think the still above hows Marta and me, but these are actually two actors, playing us! 

I think they've made a very good job at rendering us $-$ particularly me, I think. If you believe the movies: 
  • We (particularly I) are younger than we really are;
  • We drink a lot (although "Marta"'s drink 25 seconds in looks like a cross between Cranberry juice and the stuff they use to show vampires drinking human blood from hospital blood bags)...
  • We laugh a lot $-$ I think this is kind of true, though...
  • I like how 1 min 24 seconds in, "Marta" authoritatively demands a kiss on the cheek and "my" response to that is covered by floating webpages $-$ kind of rated R...
  • The storyline seems to suggest that we thought about doing this as wondered whether we should do a Bayesian model $-$ of course that was never in question!...
Anyway, I think I need to thank the guys at Taylor & Francis (Clare Dodd, in particular), who've done an amazing job! 

Tuesday, 17 October 2017

The Alan Turing's project

The Alan Turing Institute (ATI) has just announced the next round of Doctoral Studentships.

Here's the original blurb with all the relevant information. The guys in the picture are not part of the supervisory teams (but I think I will be...).

We are seeking highly talented and motivated graduates to apply for our fully funded doctoral studentship scheme commencing October 2018 and welcome applications from home/EU and international students.

We are the national institute for data science, created in 2015 in response to a need for greater investment in data science research. Headquartered at the British Library in the heart of London’s vibrant Knowledge Quarter, the Institute was founded by the universities of CambridgeEdinburgh, OxfordUniversity College London and Warwick – and the UK Engineering and Physical Sciences Research Council.

The Turing 2018 Doctoral Studentships are an exceptional opportunity for talented individuals looking to embark on a career in the rapidly emerging field of data science.

Turing students will have access to a wide range of benefits unique to the Institute:

  • Access to a range of events, seminars, reading groups and workshops delivered by leaders in research, government and industry
  • Opportunities to collaborate on real world projects for societal impact with our current and emerging industry partners
  • Expert support and guidance through all stages of the studentship delivered by supervisors who are Fellows of the Turing or substantively engaged with us
  • Access to brilliant minds researching a range of subjects with opportunities to collaborate and join or start interest groups
  • Networking opportunities through the Institute, university and strategic partners
  • Bespoke HQ designed for optimal study and inter disciplinary collaborations

Studentships include a tax-free stipend of £20,500 per annum (up to 3.5-years), plus home/EU tuition fees and a travel allowance. A limited number of fully-funded overseas studentships are also available.

Additional studentships may be available through our Strategic Partners – HSBC, Intel, Lloyds’ Register Foundation and UK Government – Defence & Security with projects aligned to our strategic priorities.

In line with the Institute’s cross-disciplinary research community, we particularly welcome applications from graduates whose research spans multiple disciplines and applications.

Application deadline: 12:00 GMT Thursday 30 November 2017

Monday, 9 October 2017

Summer school in Leuven

Emmanuel has organised earlier this year the first edition of the Summer School on Advanced Bayesian Methods, in the beautiful Belgian town of Leuven (which is also where we had our Bayes conference a couple of years ago).

For next year, they have planned the second edition, which will run from 24th to 28th September and I'm thrilled that they have invited me to do the second part on... you guessed it: Bayesian Methods in Health Economic Evaluation.

The programme is really interesting and Mike Daniels will do the first three days on Bayesian Parametric and Nonparametric Methods for Missing Data and Causal Inference. 

Wednesday, 27 September 2017

24. Nearly.

As the academic year is beginning (our courses will officially start next week), this week has seen the arrival of our new students, including those in our MSc Health Economics & Decision Science (I've talked about this here and here).

When we set out the planning, we were a bit nervous because, while everybody at UCL has been very encouraging and supportive, we were also given a rather hard target $-$ get at least 12 students, or else this is not viable. (I don't think we were actually told what would have happened if we had recruited fewer students. But I don't think we cared to ask $-$ the tone seemed scary enough)...

Well, as it happens, we've effectively doubled the target and we now have 22 students starting on the programme $-$ there may be a couple more additions, but even if they fail to turn up, I think Jolene, Marcos and I will count ourselves very happy! I've spoken to some of the students yesterday and earlier today and they all seem very enthusiastic, which is obviously very good!

Related to this, we'll soon start our new seminar series, to which all the MSc students are "strongly encouraged" to participate. But I'll post more generally in case they may be of interest to a wider audience...

Friday, 8 September 2017

Building the EVSI

Anna and I have just arxived a paper (that we've also submitted to Value in Health), in which we're trying to publicise more widely and in a less technical way the "Moment Matching" method (which we sent to MDM and should be on track and possibly out soon...) to estimate the Expected Value of Sample Information.

The main point of this paper is to showcase the method and highlight its usability $-$ we are also working on computational tools that we'll use to simplify and generalise the analysis. It's an exciting project, I think and luckily we've got our hands on data and designs for some real studies, so we can play around them, which is also nice. I'll post more soon.

Anna has suggested the title of the paper with Bob the builder in mind (so "Can we do it? Yes we can"), although perhaps President Obama (simply "Yes we can") may have worked better. Either way, the picture to the left is just perfect for when we turn this into a presentation...

Thursday, 7 September 2017

Planes, trains and automobiles

For some reason, Kobi's favourite thing in the world is flying on an airplane, with making paper airplanes a very closed second and playing airport pretending to check (real) suitcases in and setting off through security as a rather close third.

So it's not surprising that he was quite upset when I told him I would go on an airplane not once, not twice, but three times in the space of just a couple of weeks (in fact, I'll fly to Pisa, then Paris, come back on a train, ride a train again to Brussels and back and finally fly to Bologna and back, all to give talks at several places. From Bologna, I'll actually need to hire a car, because my talk is in nearby Parma). 

I think for a moment Kobi did consider stop loving me. But luckily, I think the crisis has been averted and I got him back on good terms when I told him it's not too long until he can fly again...

Yesterday I was Glasgow to give a talk at the Conference of the Royal Statistical Society in the first leg of my September travels-for-talks. My talk was in a session on missing data in health economic evaluation, with Andrew Briggs and James Carpenter also speaking. I think the session was really interesting and we had a rather good audience, so I was pleased with that.

My talk was basically stealing from Andrea's PhD work $-$ we (this includes also Alexina and Rachael who are co-supervising the project) have been doing some interesting stuff on modelling costs and benefit individual level data accounting for correlation between the outcomes; skeweness in the distributions; and "structural" values (eg spikes at QALY values of 1, which cannot be modelled directly using a Beta distribution).

Andrea has done some very good work also in programming the relevant functions in BUGS/JAGS (and he's having a stub at Stan too) into a beta-version of what we'll be our next package (we have called it missingHE) $-$ I'll say more on this when we have a little more established material ready.

The next trip is to Paris on Monday to give a talk at the Department of Biostatistics, in the Institut Gustav Roussy, where I'll speak about (you guessed it...)  Bayesian methods in health economics. I'll link to my presentation (that is when I'm finished tweaking it...).

Wednesday, 30 August 2017

A couple of things...

Just a couple of interesting things...

1. Petros sends me this advert for a post as Biostatistician at the Hospital for Sick Children in Toronto
The Child Health Evaluative Sciences Program at the Hospital for Sick Children in Toronto is recruiting a PhD Biostatistician to lead the execution of a CIHR funded clinical trial methodology project, and the planning of upcoming trials with a focus on:
  • improving and using methods of Bayesian Decision analysis and Value of Information in pediatric trial design and analysis;
  • using patient and caregiver preference elicitation methods (e.g. discrete choice experiments) in pediatrics;
  • developing of statistical plan and conducting the statistical analysis for pediatric clinical trials.
The Biostatistician will collaborate with the Principal Investigators (PIs) of four trials that are in the design stage, and with two senior biostatisticians and methodologists within the CHES program. The successful candidate will have protected time for independent methods development. A cross appointment with the Dalla Lana School of Public Health at the University of Toronto will be sought.

Here’s What You’ll Get To Do:
In collaboration with the trials’ Principal Investigators (PIs), develop the study protocols;
Contribute in the conceptualization and development of decision analytic models;
Contribute in conducting literature reviews and keep current with study literature;
Assist with design/development and implementation of value of information methods;
Contribute to preparation of reports, presentations, and manuscripts.

Here’s What You’ll Need:
Graduate degree in Statistics, Biostatistics, Health Economics or a related discipline;
Ability to function independently yet collaboratively within a team;
Excellent statistical programming skills predominantly using R software; 
Experience with report and manuscript writing;
Effective communication, interpersonal, facilitation and organizational skills;
Meticulous attention to detail.
Employment Type: 

Temporary, Full-Time (3 year contract with possibilities for renewal)

Contacts: Dr. Petros Pechlivanoglou and Dr. Martin Offringa 

2. And Manuel has an advert for a very interesting short course on Missing Data in health economic evaluations (I will do my bit on Bayesian methods to do this, which is also very much related to the talk I'll give at the RSS conference in Glasgow, later in September $-$ this is part of Andrea's PhD work). I'll post more on this later.
Two-day short course: Methods for addressing missing data in health economic evaluation

Dates: 21-22 September, 2017

Venue: University College London

Missing data are ubiquitous in health economic evaluation. The major concern that arises with missing data is that individuals with missing information tend to be systematically different from those with complete data. As a result, cost-effectiveness inferences based on complete cases are often misleading. These concerns face health economic evaluation based on a single study, and studies that synthesise data from several sources in decision models. While accessible, appropriate methods for addressing the missing data are available in most software packages, their uptake in health economic evaluation has been limited.

Taught by leading experts in missing data methodology, this course offers an in-depth description of both introductory and advanced methods for addressing missing data in economic evaluation. These will include multiple imputation, hierarchical approaches, sensitivity analysis using pattern mixture models and Bayesian methods. The course will introduce the statistical concepts and underlying assumptions of each method, and provide extensive guidance on the application of the methods in practice. Participants will engage in practical sessions illustrating how to implement each technique with user-friendly software (Stata).

At the end of the course, the participants should be able to develop an entire strategy to address missing data in health economic studies, from describing the problem, to choosing an appropriate statistical approach, to conducting sensitivity analysis to standard missing data assumptions, to interpreting the cost-effectiveness results in light of those assumptions.

Who should apply?
The course is aimed at health economists, statisticians, policy advisors or other analysts with an interest in health economic evaluation, who would like to expand their toolbox. It is anticipated that participants will be interested in undertaking or interpreting cost-effectiveness analyses that use patient-level data, either from clinical trials or observational data.

Course fees: £600 (Commercial/Industry); £450 (Public sector); £200 (Students); payable by the 8th September 2017.

To register for the course or for further information, please see here

Monday, 14 August 2017

When simple becomes complicated...

A while ago, Anna and I published an editorial in Global & Regional Health Technology Assessment. In the paper, we discuss one of my favourite topics $-$ how models for health technology assessment and cost-effectiveness analysis should increasingly move away from using spreadsheet (basically, Excel) and towards proper statistical software.

The main arguments that historically have been used to support spreadsheet-based modelling are those of "simplicity and transparency" $-$ which really grinds my gears. In the paper we also argue that, may be, as statisticians we should invest in efforts towards designing our models using user-interfaces, or GUIs $-$ the obvious example is web-apps. This would expand and extend work done, eg in SAVI, or BCEAweb or bmetaweb, just to name a few (that I'm more familiar with...). 

Friday, 28 July 2017

Picky people (2)

I've complained here about the fonts for some parts of the computer code in our book . Eva (our publisher) has picked up on this and has been brilliant and very quick in trying to fix the issue. I think they will update the fonts so that at least on the ebooks version all will look nice!

Friday, 7 July 2017

Conflict of interest

I am fully aware that this post is seriously affected by a conflict of interest, because what I'm about to discuss (in positive terms!) is work by Anthony, who's doing a very good job on his PhD (which I co-supervise).

But, I thought I'd do like our former PM (BTW: see this; I really liked the series) and sort conflict of interests by effectively ignoring them (to be fair, this seems to be a popular strategy, so let's not be too harsh on Silvio...).

Anyway, Anthony has written an editorial, which has received some traction in the mainstream media (for example here, here or here). Not much that I disagree with in Anthony's piece, except that I am really sceptical of any bake & eat situation $-$ the only exception is when I actually make pizza from scratch...

Tuesday, 20 June 2017

Picky people

Our book on Bayesian cost-effectiveness analysis using BCEA is out (I think as of last week). This has been a long process (I've talked about this here, here and here). 

Today I've come back to the office and have open the package with my copies. The book looks nice $-$ I am only a bit disappointed about a couple of formatting things, specifically the way in which computer code got badly formatted in chapter 4. 
We had originally used specific font, but for some reason in that chapter all computer code is formatted in Times New Romans. I think we did check in the proofs and I don't recall seeing this (which, to be fair, isn't necessarily to swear that we didn't miss it, while checking...).

Not a biggie. But it bothers me, a bit. Well, OK: a lot. But then again, I am a(n annoyingly) picky person...

Monday, 19 June 2017

Homecoming (of sort...)

I spent last week in Florence for our Summer School. Of course, it was home-coming for me and I really enjoyed being back to Florence $-$ although it was really hot. I would say I'm not used to that level of heat anymore, if it wasn't for the fact that I have caught my brother (who still lives there) huffing and complaining about it several times!...

I think it was a very good week $-$ we had capped the number of participants at 27; everybody showed up and I think had a good time. I think I can speak for myself as well as for Chris, Nicky, Mark and Anna and say that we certainly enjoyed being around people who were so committed and interested! We did joke at several points that we didn't even have to ask the questions $-$ they were starting the discussion almost without us prompting it...

The location was also very good and helped make sure everybody was enjoying it. The Centro Studi in Fiesole is an amazing place $-$ not too close to Florence that people always disappears after the lectures, but not too far either. So there was always somebody there even for dinner and a chat in the beautiful garden, although some people would venture down the hill (notably, many did so by walking!). We also went to Florence a couple of times (the picture is one of my favourite spots of the city, which I obviously brought everybody to...).

Friday, 9 June 2017


So: for once I woke up this morning feeling slightly quite tired for the late night, but also rather upbeat after an election. The final results of the general election are out and have produced quite some shock. 

Throughout yesterday, it looked as though the final polls were returning an improved majority for the Conservative party $-$ this would have been consistent with the "shy Tory" effect. Even Yougov had presented their latest poll suggesting a seven points lead and improved Tory majority. So I guess many people were unprepared for the exit polls, which suggested a very different figure...

First off, I think that the actual results have vindicated Yougov's model (rather than the poll), based on a hierarchical model informed by over 50,000 individual-level data on voting intention as well as several other covariates. They weren't spot on, but quite close. 

Also, the exit polls (based on a sample of over 30,000) were remarkably good. To be fair, however, I think that exit polls are different than the pre-election polls, because unlike them they do not ask about "voting intentions", but the actual vote that people have just cast.

And now, time for the post-mortem. My final prediction using all the polls at June 8th was as follows:

                mean       sd 2.5% median 97.5%     OBSERVED
Conservative 346.827 3.411262  339    347   354          318
Labour       224.128 3.414861  218    224   233          261
UKIP           0.000 0.000000    0      0     0            0
Lib Dem       10.833 2.325622    7     11    15           12
SNP           49.085 1.842599   45     49    51           35
Green          0.000 0.000000    0      0     0            1
PCY            1.127 1.013853    0      2     3            4

Not all bad, but not quite spot on either and to be fair, less spot on than Yougov's (as I said, I was hoping they were closer to the truth than my model, so not too many complaints there!...).

I've thought a bit about the discrepancies and I think a couple of issues stand out:

  1. I (together with several other predictions and in fact even Yougov) have overestimated the vote and, more importantly, the number of seats won by the SNP. I think in my case, the main issue had to do with the polls I have used to build my model. As it has happened, the battleground in Scotland has been rather different than the rest of the country, I think. But what was feeding into my model were the data from national polls. I had tried to bump up my prior for the SNP to counter this effect. But most likely this has exaggerated the result, producing an estimate that was too optimistic.
  2. Interestingly, the error for the SNP is 14 seats; 12 of these, I think, have (rather surprisingly) gone to the Tories. So, basically, I've got the Tory vote wrong by (347-318+12)=41 seats $-$ which if you actually allocate to Labour would have brought my prediction to 224+41=265. 
  3. Post-hoc adjustements aside, it is obvious that my model had overestimated the result for the Tories, while underestimating Labour's performance. In this case, I think the problem was that the structure I had used was mainly based on the distinction between leave and remain areas at last year's referendum. And of course, these were highly related to the vote that in 2015 had gone to UKIP. Now: like virtually everybody, I have correctly predicted that UKIP would get "zip, nada, zilch" seats. In my case, this was done by combining the poor performance in the polls with a strongly informative prior (which, incidentally, was not strong enough and combined with the polls, I did overestimate UKIP vote share). However, I think that the aggregate data observed in the polls had consistently tended to indicate that in leave areas the Tories would have had massive gains. What actually happened was in fact that the former UKIP vote has split nearly evenly between the two major parties. So, in strong leave areas, the Tories have gained marginally more than Labour, but that was not enough to swing and win the marginal Labour seats. Conversely, in remain areas, Labour has done really well (as the polls were suggesting) and this has in many cases produced a change in colours in some Conservative marginal seats.
  4. I missed the Green's success in Brighton. This was, I think, down to being a bit lazy and not bothering telling the model that in Caroline Lucas' seat the Lib Dem had not fielded a candidate. This in turn meant that the model was predicting a big surge in the vote for the Lib Dems (because Brighton Pavilion is a strong remain area), which would eat into the Green's majority. And so my model was predicting a change to Labour, which never happened (again, I'm quite pleased to have got it wrong here, because I really like Ms Lucas!).
  5. My model had correctly guessed that the Conservatives would regain Richmond Park, but that the Lib Dems had got back Twickenham and Labour would have held Copeland. In comparison to Electoralcalculus's prediction, I've done very well in predicting the number of seats for the Lib Dems. I am not sure about the details of their model, but I am guessing that they had some strong prior to (over)discount the polls, which has lead to a substantial underestimation. In contrast, I think that my prior for the Lib Dems was spot on.
  6. Back to Yougov's model, I think that the main, huge difference, has been the fact that they could rely on a very large number of individual level data. The published polls would only provide aggregated information, which almost invariably would only cross-tabulate one variable at a time (ie voting intention in Leave vs Remain, or in London vs other areas, etc $-$ but not both). To actually be able to analyse the individual level data (combined of course with a sound modelling structure!) has allowed Yougov to get some of the true underlying trends right, which models based on the aggregated polls simply couldn't, I think.
It's been a fun process $-$ and all in all, I'm enjoying the outcome...

Wednesday, 7 June 2017


Today I've taken a break from the general election modelling $-$ well, not really... Of course I've checked whether there were new polls available and have updated the model! 

But: nothing much changes, so for today, I'll actually concentrate on something else. I was invited to give a talk at the Imperial/King's College Researchers' Society Workshop $-$ I think this is something they organise routinely.

They asked me to talk about "Blogging and Science Communication" and I decided to have some fun with this. My talk is here. I've given examples of weird stuff associated with this blog $-$ not that I had to look very hard to find many of them...

And I did have fun giving the talk! Of course, the posts about the election did feature, so eventually I got to talk about them to...

Tuesday, 6 June 2017

The Inbetweeners

When it first was shown, I really liked "The Inbetweeners" $-$ it was at times quite rude and cheap, but it did make me laugh, despite the fact that, as it often happens, all the main characters did look a bit older than the age they were trying to portrait...

Anyway, as is increasingly often the case, this post has very little to do with its title and (surprise!) it's again about the model for the UK general election.

There has been lots of talk (including in Andrew Gelman's blog) in the past few days about Yougov's new model, which is based on Gelman's MRP (Multilevel Regression and Post-stratification). I think the model is quite cool and it obviously is very rigorous $-$ it considers a very big poll (with over 50,000 responses), assumes some form of exchangeability to pool information across different individual respondents' characteristics (including geographical area) and then reproportions the estimated vote shares (in a similar way to what my model does) to produce an overall prediction of the final outcome.

Much of the hype (particularly in the British mainstream media), however, has been related to the fact that Yougov's model produces a result that is very different from most of the other poll analyses, ie a much worse performance for the Tories, who are estimated to gain only 304 seats (with a 95% credible interval of 265-342). That's even less than the last general election. Labour are estimated to get 266 (230-300) seats and so there have been hints of a hung parliament, come Friday.

Electoralcalculus (EC) has a short article in their home page to explain the differences in their assessment, which (more in line with my model) still gives the Tories a majority of 361 (to Labour's 216).

As for my model, the very latest estimate is the following:

                mean        sd 2.5% median   97.5%
Conservative 347.870 3.2338147  341    347 355.000
Labour       222.620 3.1742205  216    223 230.000
UKIP           0.000 0.0000000    0      0   0.000
Lib Dem       11.709 2.3103369    7     12  16.000
SNP           48.699 2.0781525   44     49  51.000
Green          0.000 0.0000000    0      0   0.000
PCY            1.102 0.9892293    0      1   2.025
Other          0.000 0.0000000    0      0   0.000

so somewhere in between Yougov and EC (very partisan comment: man how I wish Yougov got it right!).

One of the points that EC explicitly models (although I'm not sure exactly how $-$ the details of their model are not immediately evident, I think) is the poll bias against the Tories. They counter this by (I think) arbitrarily redistributing 1.1% of the vote shares from Labour to the Tories. This probably explains why their model is a bit more favourable to the Conservatives, while being driven by the data in the polls, which seem to suggest Labour are catching up.

I think Yougov model is very extensive and possibly does get it right $-$ after all, speaking only for my own model, Brexit is one of the factors and possibly can act as proxy for many others (age, education, etc). But surely there'll be more than that to make people's mind? Only few more days before we find out...

Friday, 2 June 2017

The code (and other stuff...)

I've received a couple of emails or comments on one of the General Election posts to ask me to share the code I've used. 

In general, I think this is a bit dirty and lots could be done in a more efficient way $-$ effectively, I'm doing this out of my own curiosity and while I think the model is sensible, it's probably not "publication-standard" (in terms of annotation etc).

Anyway, I've created a (rather plain) GitHub repository, which contains the basic files (including R script, R functions, basic data and JAGS model). Given time (which I'm not given...), I'd like to put a lot more description and perhaps also write a Stan version of the model code. I could also write a more precise model description $-$ I'll try to update the material on the GitHub.

On another note, the previous posts have been syndicated in a couple of places (here and here), which was nice. And finally, here's a little update with the latest data. As of today, the model predicts the following seats distribution.

                mean        sd 2.5% median 97.5%
Conservative 352.124 3.8760350  345    352   359
Labour       216.615 3.8041091  211    217   224
UKIP           0.000 0.0000000    0      0     0
Lib Dem       12.084 1.8752228    8     12    16
SNP           49.844 1.8240041   45     51    52
Green          0.000 0.0000000    0      0     0
PCY            1.333 0.9513233    0      2     3
Other          0.000 0.0000000    0      0     0

Labour are still slowly but surely gaining some ground $-$ I'm not sure the effect of the debate earlier this week (which was deserted by the PM) are visible yet as only a couple of the polls included were conducted after that.

Another interesting thing (following up on this post) is the analysis of the marginal seats that the model predicts to swing from the 2015 Winners. I've updated the plot, which now looks as below.

Now there are 30 constituencies that are predicted to change hand, many still towards the Tories. I am not a political scientists, so I don't really know all the ins and outs of these, but I think a couple of examples are quite interesting and I would venture some comment...

So, the model doesn't know about the recent by-elections of Copeland and Stoke-on-Trent South and so still label these seats as "Labour" (as they were in 2015), although the Tories have actually now control of Copeland.

In the prediction given the polls and the impact of the EU referendum (both were strong Leave areas with with 60% and 70% of the preference, respectively) and the Tories did well in 2015 (36% vs Labour's 42% in Copeland and 33% to Labour's 39% in 2015). So, the model is suggesting that both are likely to switch to the Tories this time around.

In fact, we know that at the time of the by-election, while Copeland (where the contest was mostly Labour v Tories) did go blue, Stoke didn't. But there, the main battle was between the Labour's and the UKIP's candidate (UKIP had got 21% in 2015). And the by-election was fought last February, when the Tories lead was much more robust that it probably is now.

Another interesting area is Twickenham $-$ historically a constituency leaning to the Lib Dems, which was captured by the Conservatives in 2015. But since then, in another by-election the Tories have lost another similar area (Richmond Park,with a massive swing) and the model is suggesting that Twickenham could follow suit, come next Thursday. 

Finally, Clapton was the only seat won by UKIP in 2015, but since then, the elected MP (a former Tory-turned-UKIP) has defected the party and is not contesting the seat. This, combined with the poor standing of UKIP in the polls produces the not surprisingly outcome that Clapton is predicted to go blue with basically no uncertainty...

These results look reasonable to me $-$ not sure how life will turn out of course. As many commentators have noted much may depend on the turn out among the younger. Or other factors. And probably there'll be another instance of the "Shy-Tory effect" (I'll think about this if I get some time before the final prediction). But the model does seem to make some sense...

Tuesday, 30 May 2017

The swingers

Kaleb has left a comment on a previous post, asking what constituencies my model predicted to change hands, with respect to the 2015 election. This is not too difficult to do, given the wealth of results and quantities that can be computed, once the posterior distributions are estimated.

Basically, what I have done is to compute, based on the "possible futures" simulated by the model, the probability that the parties win each of the 632 seats in England, Wales and Scotland. Many of them seem to be very safe seats $-$ I think this is consistent with current political knowledge, although in an election like this possibly more can change...

Anyway, using the very latest analysis (as of today, 30th May and based on all polls published so far, but discounting older ones), there are 39 seats that are predicted to change hands. The following graph shows the predicted distribution of the probability of winning each of those seats, together of an indication of who won in 2015.

Of course, Labour are the big losers (there are many of the 39 constituencies that were Labour in 2015, but are predicted to swing to some other party in 9 days time). Conversely, the Tories are the big winners and most often, when they do, they are predicted to win that seat with a very large probability. There aren't very many real 50:50s $-$ a couple, I'd say, where the results are predicted to be rather uncertain. 

Incidentally, as of today, this is the distribution of seats predicted by the model.

                mean        sd 2.5% median 97.5%
Conservative 359.467 5.4492757  351    358   371
Labour       209.276 5.3613961  198    211   218
UKIP           0.000 0.0000000    0      0     0
Lib Dem       14.699 2.1621920   10     15    19
SNP           48.055 2.7271620   42     48    52
Green          0.000 0.0000000    0      0     0
PCY            0.503 0.8286602    0      0     3
Other          0.000 0.0000000    0      0     0

Labour are continuing to close the gap on the Tories, but are still a long way out. I'm curious to see what last night not-a-debate did to the polls...

Friday, 26 May 2017

(Too) slowly but surely?

After the tragic events in Manchester and the suspension in the campaigns, things have started again and a couple new polls have been released. Some of the media have also picked up the trend I was observing from my model and so I have re-updated the results.

The increasing trend for Labour does see another little surge, as does the decreasing trend for the Tories. In comparison to my last update, the Lib Dem are slightly picking up again. But all in all, the numbers still tell kind of the same story, I guess.

                mean        sd 2.5% median   97.5%
Conservative 369.251 5.1765622  357    370 378.000
Labour       197.886 5.2142298  190    197 211.000
UKIP           0.000 0.0000000    0      0   0.000
Lib Dem       15.085 2.3852598   11     15  19.025
SNP           49.263 2.3965756   44     49  53.000
Green          0.000 0.0000000    0      0   0.000
PCY            0.515 0.8499985    0      0   3.000
Other          0.000 0.0000000    0      0   0.000

These are the summary results as of today (again after discounting past polls). Lib Dem move from a median number of expected seats of 14 to the current estimate of 15; Labour go from 191 to 197 and the Tories go from 376 to 370, still comfortably in the lead. 

Monday, 22 May 2017

Quick update

This is going to be a very short post. I've been again following the latest polls and have updated my election forecast model $-$ nothing has changed in the general structure, only new data coming as the campaigns evolve.
The dynamic forecast (which considers for each day from 1 to 22 May only the polls available up to that point) show an interesting progression for Labour, who seem to be picking up some more seats. They are still a long way from the Tories, who are slightly declining. Also, the Lib Dems are also going down and the latest results seem to suggest a poor result for Plaid Cymru in Wales too (the model was forecasting up to 4 seats before, where now they are expected to get 0).

The detailed summary as of today is as follows.
                mean         sd    2.5% median 97.5%
Conservative 375.109 4.02010949 367.000    376   382
Labour       192.134 3.94862452 186.000    191   200
UKIP           0.000 0.00000000   0.000      0     0
Lib Dem       14.320 2.24781064  10.000     14    18
SNP           50.053 2.12713792  45.975     50    53
Green          0.007 0.08341438   0.000      0     0
PCY            0.377 0.77036645   0.000      0     3
Other          0.000 0.00000000   0.000      0     0

I think the trend seems genuine $-$ Labour go from a median number of predicted seats of 175 at 1st May to the current estimate of 191, the Tories go from 381 to 376 and the Lib Dems from 23 to 14. Probably not enough time to change things substantially (bar some spectacular faux pas from the Tories, I think), though...

I've also played around with the issue of coalitions $-$ there's still some speculation in the media that the "Progressives" (Labour, Lib Dems and Greens) could try and help each other by not fielding a candidate and support one of the other parties in selected constituencies, so as to maximise the chance of ousting the Conservatives. I've simply used the model prediction and (most likely unrealistically!) assumed 100% compliance from the voters, so that the coalition would get the sum of the votes originally predicted for each of the constituent parties. Here's the result.

The Progressive come much closer and the probability of an outright Tory majority is now much smaller, but still...

Monday, 15 May 2017

Through time & space

I've continued to fill in the data from the polls and re-run the model for the next UK general election. I think the dynamic element is interesting in principle, mainly because of how the data from the most recent polls could be weighed differently than those further in the past.

Roberto had done an amazing job, building on Linzer's work and using a rather complex model to account for the fact that the polls are temporally correlated and, as you get closer to election day, the historical data are much less informative. This time, I have done something much simpler and somewhat more arbitrary, simply based on discounting the polls depending on how distant they are from "today".

This is the results given by my model in the period from May 1st to May 12th $-$ at every day, I've only included the polls available at that time and discounted using a 10% rate, assuming modern life really runs very fast (which it reasonably does...). Not much is really changing and the predictions in terms of the number of seats won by the parties in England, Wales and Scotland seems fairly stable $-$ Labour is probably gaining a couple of seats, but the story is basically unchanged.

The other interesting thing (which I had done here and here too) is to analyse the predicted geographical distribution of the votes/seats. Now, however, I'm taking full advantage of the probabilistic nature of the model and not only am I plotting on the map the "most likely outcome" (assigning a colour to each constituency, depending on who's predicted to win it). In the graph below, I've also computed the probability that the party most likely to win a given seat actually does so (based on the simulations from the posterior distributions of the vote shares, as explained here) $-$ I've shaded the colours so that lighter constituencies are more uncertain (i.e. the win may be more marginal).

There aren't very many marginal seats (according to the model) and most of the times, the chance of a party winning a constituency exceeds 0.6 (which is fairly high, as it would mean a swing of over 10% from the prediction to overturn this).

This is also the split across different regions $-$ again, not many open battlefields, I think. In London, Hornsey and Wood Green is predicted to go Labour but with a probability of only 54%, while Tooting is predicted to go Tory (with a chance of 58%).