Danish Delight - Using our panel to successfully avoid a false recall miss
Our experience of polling and forecasting the Danish General Election indicate false recall is an active problem in forecasting
Ahead of the Danish General Election at the start of this month, YouGov put out its final forecast of the vote shares which would be obtained by each party on the ballot:
Socialdemokraterne (A) - 27.6%
Venstre (V) - 12.9%
Moderaterne (M) - 8%
Socialistisk Folkeparti (F) - 8.5%
Danmarksdemokraterne (Æ) - 10.1%
Liberal Alliance (I) - 6.9%
After the votes were counted and verified by Statistics Denmark, the eventual results were:
Socialdemokraterne (A) - 27.5% (+0.1 error)
Venstre (V) - 13.3% (-0.4)
Moderaterne (M) - 9.3% (-1.3)
Socialistisk Folkeparti (F) - 8.3% (+0.2)
Danmarksdemokraterne (Æ) - 8.1% (+2.0)
Liberal Alliance (I) - 7.9% (-1.0)
In all, a strong result for us. Indeed, it was a pretty good night for polling companies across the board, with a very low average error across all five firms polling the race (YouGov, Gallup, Voxometer, Epinion, and Megaton). The ‘mean squared error’ of the industry’s forecast was just over 1 (implying that the whole industry correctly estimated every party well within the standard ‘margin of error’ of between two and three percentage points).
One of the interesting outcomes of the night however was that while we at YouGov had an excellent handle on the winning party share and the winning margin, the rest of the industry significantly underestimated the Social Democrats (A.) and (to a lesser extent) overstated Venstre (V.).
Pollster and Exit Polling error on the top two parties, according to Erik Gahner Larsen
This was by no means a major polling miss, but collectively there was more than a significant understatement of the eventual winning margin by our four rival firms - and indeed the Exit Polls.
In fact, this was a theme throughout the election polling: YouGov polls placing A. significantly higher and V. significantly lower than the industry average, with a much clearer gap between the two as a result. We came under flak for this at various points in the campaign, with commentators doubting our prospective accuracy.
This wasn’t perhaps without justification, as YouGov’s 2019 Danish election polling had done exactly that - significantly over-estimated the A. lead over V.
However, this time we felt things were different.
Namely, we hypothesised and then tested the theory that the distance between us and the rest of the industry on the estimated A. lead over V. was down to a polling (and general survey research) problem named ‘false recall’. If you are very familiar with this concept, you will probably want to skip the next couple of sections.
False recall - a brief explainer
False recall pertains to the effect whereby individuals answering surveys may misremember, forget about, or intentionally be untruthful about their previous behaviour or information about themselves.
For instance, say I stopped you in the street and asked you to recount to me the last meal you had. Not too difficult a proposition for the vast majority of people - depending on the time of day and on your schedule, it may even have only been an hour or so ago.
However, what if I asked you what you had for lunch three days ago? What about dinner eight days ago? What did you eat on the 15th September 2020? For many folk, answering these questions (particularly the last one) would be a struggle. We cannot (and should not) expect people to be able to tell us what they had for dinner on a specific date, more than two years ago. Most people, if they were to answer that question at all, would innocuously and accidentally give a false response in their attempts to recall the information. This is a basic, researcher-induced, example of false recall.
Now imagine that we throw an effect called ‘social desirability’ into the mix. Briefly, social desirability pressures occur when people - both in and out of social research situations - adapt and adjust their responses, words, or behaviour accordingly to fit into whatever norm or expectation they believe exists in their social environment (either immediate or broader contextual).
Famous examples of social desirability effecting survey measurements include people with extreme political opinions perhaps tempering their responses when answering face-to-face surveys in order to avoid anticipated judgement from the interviewer, or ‘herding to the winner’ and reporting having voted for the winning party in an election when in fact they voted for someone else.
Back to the thought experiments, and let’s say then that I am asking a group of four people a simple question: ‘What did you have for dinner last night?’
Imagine that three out of the four are vegan, the fourth is an occasional meat eater. The first three (the vegans) recount various meat and dairy free dishes in response to my question.
Now, let’s say that the fourth person had a 12 ounce rib-eye the night before, at a fancy steak restaurant in town. They may well feel pretty embarrassed to answer this question truthfully in front of their three vegan friends. So, instead, they tell me the meal they had the day before yesterday - a lentil curry.
This would again be false recall, but this time purposeful and motivated - on account of social desirability. The last respondent felt some social pressure to answer in a particular way, even though it wasn’t entirely truthful.
Lastly, we might imagine that I ask these another person what they ate on Christmas Day last year. For the sake of the thought experiment, let’s assume they are from a British family, and celebrated Christmas with a feast of typical British Christmas cuisines - turkey, stuffing, roast potatoes, you know the drill. They almost salivate as they recount the meal in exquisite detail.
But then let’s imagine I ask them what they ate for dinner on New Year’s Day this year. Just a few days later (and in fact closer to now). Now, some folk right remember the specific leftover party food they had on that bank holiday day. Others might recall ordering a hangover-curing takeaway. But for most people, recalling that information would be something of a struggle - despite the task on the surface being quite similar to the question before (tell me what you ate on this significant and well-known day).
The difference between the ability to remember one piece of information but not the other - despite them being on similar topics and/or from similar time frames - is what we would call ‘false recall by second order effects’. Namely, New Year’s Day is a second order (less important) event when compared to Christmas Day (first order).
In fact, in the case of someone who is asked the New Year’s Day question without first having been asked about their Christmas meal, they might actually mistakenly tell me what they ate for Christmas Day instead, as that was the prevailing memory of meals eaten in that time frame.
Unlikely, perhaps, but it could happen. This is, once again, false recall. This time, it is caused by a stronger memory (or a memory of a more consequential or important event) overriding another. This is another example of first and second order effects causing false recall.
False recall in election surveys
Back into the election polling world, a classic example of false recall would be someone telling a pollster that they voted in the last national election, when they in fact did not. Social desirability - wanting to appear like they have done their civic duty and exercised their right to vote - pressures them into a false recounting their past behaviour.
This is actually a common problem in survey research, which is why you should always be extremely cautious of interpreting survey estimates or survey-based models of voter turnout. Surveys - and particularly polls - will always, always over-estimate turnout.
Another false recall problem can arise when respondents are asked to recount their party vote at previous elections. So, even if they correctly recall that they voted, they don’t correctly remember for which party they voted.
This could be for two reasons. First, perhaps they genuinely forget or misremember who they voted for. This is especially common when we ask about elections far back into the past (the “what did you eat for dinner two years ago” problem), or in what we call ‘second order’ elections (where less power is at stake and less attention is paid) such as European Parliamentary or Local Authority elections, for which memory can be tricky or completely overwritten by their past vote in subsequent (or concurrent), more important elections (the “tell me what you had for dinner on New Year’s Day” problem).
Second, there might be social desirability effects which cause people to knowingly falsely recall their past vote. For example, and as noted above, we might observe people ‘herding to the winner’, or feeling pressured into suggesting that they voted for a moderate, centre-right party, when in fact they voted for a far-right party.
False recall problems are particularly acute for polling companies who do not make use of a panel framework (or those who do use panels, but in countries where storing past vote behaviour is strictly not allowed, such as in Japan). Examples include random digit dialling telephone pollsters, river-sampling polls, or mail surveys. This is because pollsters making use of such methods have to repeatedly, freshly ask respondents for their past voting behaviour at the time of the survey for every respondent in every survey.
Allow us to return to the meals problem as an example. Instead of asking you today what you had for dinner on New Year’s Day this year, what if I asked you on 2nd January? You’d be far more likely to remember - it was only the previous day. Suppose then I stored that information about you on a record, along with hundreds of thousands of other data points.
That way, if I need to be sure I have enough people who were still eating turkey on New Year’s Day in my sample, I can quickly and efficiently meet that requirement by going back to that stored information on all my respondents, and then recruit some more turkey-eaters from my panel if I need to.
This is what we call ‘active quota sampling’, and it’s a method we make use of at YouGov. Our panellists give us a whole wealth of information when they sign up to the panel, and then again as soon as significant events (such as elections or referenda) are over. We collect all that data, store it, and then never have to touch it again (the exception would be things like occasional check ins on things like location, martial and employment status, and so on).
We can also be much, much more sure that the information we hand over to clients or use for sampling and/or prediction purposes is reliable, because we took and stored that information as soon (and as close to the relevant event) as we possibly could. I can be much more certain about what you ate on New Year’s Day if I asked you and wrote it down on January 2nd than I can if I asked you and wrote it down now.
This does not eliminate false recall entirely; temporal effects will still arise if a panellist joins and is asked for their past vote some years (let’s say, three) after the election itself. It would also not escape social desirability pressures for false recall. But it certainly helps, on both fronts.
False recall in the Danish Folketingsvalg
We turn now back to the case of the Danish General Election (in Danish: Folketingsvalg), and our puzzle as to why we had consistently had significantly bigger A. leads over V. than the rest of the field.
There is no tradition or set of rules which govern the release of methodology or tables of data in Denmark, so we could not be 100% sure, but from the information we were able to gather on each of our rivals, we were fairly certain each of them were relying on fresh vote recall - i.e., they did not have a panel structure as YouGov do.
We were also certain that, like us, our fellow pollsters would be using past vote recall to weight their samples, making sure they had enough voters from each respective parties (and those who did not vote at all) in their data.
Perhaps, then, we thought, false recall might explain this gap? What if there was some structural problem with the way Danes were self reporting who they voted for in 2019 (the previous Danish General Election)?
This, we thought, could be particularly likely in the context of Venstre voters, with their party declining strongly, from 23% in the 2019 General Election to an average of around 14% in pre-2022 election polling. The rationale here being that we thought significant numbers of 2019 Venstre voters may now shy away from recalling having voted for them, given their increasing unpopularity now.
To test this idea, we started asking past vote fresh in our survey and trying out our weight scheme using that information instead of stored results. The effects were quite striking - again and again, they brought our figures closer to the median of the rest of the field and away from the story we were telling. Hunch confirmed, we thought.
On the day of our final poll, we ran the test again - weighting back to fresh, rather than stored, 2019 vote recall. It produced the following figures:
Socialdemokraterne (A) - 25.5%
Venstre (V) - 13.1%
Moderaterne (M) - 8%
Socialistisk Folkeparti (F) - 9.3%
Danmarksdemokraterne (Æ) - 9.6%
Liberal Alliance (I) - 7.1%
We were struck at how close the figures for the Social Democrats were in particular to the rest of the field, insofar as they almost entirely ate up the difference between our position and the median of the rest of the field.
According to our figures, around a one in five (21%) of those who originally told us they voted for one party (or none at all) were now telling us they voted for a different party (or none at all). And the problem, when rolled into the weighting model, appeared to effect the Social Democrats’ vote share in particular.
And, in another hunch confirmation, we found that 2019 V. voters to be displaying significantly higher false recall (around 25%) than 2019 A. voters (around 10%).
Armed with this reassurance, we stuck ourselves out on our comparative limb and went with the story we had - a 14.7 point margin of victory for A. over V.1
Of course, as it turned out, our assumptions and faith in our figures were correctly placed. Though the eventual lead would come in at 14.2 points, our story of that lead was by far the most accurate of all pollsters - and even beat the Exit Polls.
Which is not to say that the rest of the industry weren’t accurate. Indeed, some of our competitors were more accurate than us further down the party list. Overall, everyone performed very well in polling the election. I’d recommend this excellent write up by Erik Gahner Larsen on the overall performance of the Danish polling industry in the election.
Overall, our experience of the Danish General Election of 2022 was an extremely positive one, and restatement to the strengths of panel-based survey research. Particularly in reference to avoiding false recall.
Before I sign off, I must give a strong high five to my colleagues Beth Mann, Adam McDonnell, Julie Schou, and Tove Keldsen for all their work on our Danish election polling programme, too. Go find them and give them a follow on Twitter!
Which is, by the way, another thing worth mentioning - the precision to which vote intention is estimated in Denmark and indeed across many European countries. It is established practice to report polls to one decimal place here, and also in Italy.
In Germany and France, reporting to half a percentage point is the standard.
In the UK, our general position is that if we accept that the ‘margin of error’ behind any survey point estimate is around 3% (at least), then it makes very little sense to be that precise (particularly to one decimal place) about said point estimate.