by Taylor Saunders and Annette Przygoda
Introduction
On May 13, 2013, one day before the provincial elections in British Columbia (B.C.), all major polling companies projected a landslide win for the NDP. While media reports showed a slight tightening of the race in the last two weeks of the campaign, as of 24 hours before the elections, pollsters predicted the NDP's share of the populat vote to be between 41% and 46%, compared to 31% to 37% for the B.C. Liberals. On election day, the Liberals won convincingly, taking 44.1% of the popular vote compared to 39.7% for the NDP.
Election
observers, the media and voters were stunned. What happened? On average,
polling companies were off by 18 percentage points with their predictions.
Forum Research came closest, with an 8 point difference, but still predicted a
win for the NDP that never materialized. Big-name polling companies like Ipsos,
Angus Reid and EKOS were off by 15 to 20 percentage points.
In
the immediate aftermath of the election, this stunning shift made for a lot of
good media stories that were mostly focussed on Christy Clark as the “Comeback
Kid”. Major news outlets reported on the particular strength of Clark’s
last days of the campaign or discussed the fact that her negative campaign
appeared to have worked . Other reports focussed on the failure of the NDP to capitalize on their lead in the polls. Discussions of the accuracy of polls and
predictions were rare. Where they did happen, almost all reports attributed the
dismal performance of the predictions to a massive election-day
opinion shift among voters.
These
reports were bolstered further by the early reactions of the pollsters
themselves. Those reactions included statements like Angus Reid’s assertion
that the days “of
co-operative respondents who want to tell pollsters what they think and of good
citizens who show up to vote” are over. Essentially, the polling companies
wanted us to believe that predicting people’s actions is impossible, that
people change their opinions and political affiliations like they change their
underwear.
If
that was true, millions of statisticians and researchers worldwide would be out
of a job. We suggest that rather than collectively waving the white flag and
giving up on our jobs of understanding and predicting behaviour, there are ways
to explain what happened and why the election predictions failed so
spectacularly: flawed polling methodology, in particular, a lack of a “likely
voter” model and failing to adequately account for undecided voters in the
predictions.
Standing On The Shoulders Of Giants
Some
people believe that election polling can be done without having much in-depth
subject matter expertise about voting behaviour or politics; that it is purely
about crunching some numbers.
When looking at successful polling and predictions, we quickly see that that is
not true. Polling relies on identifying not only what party or candidate
someone would vote for in a given election but also whether someone will
actually turn out on election day to cast a ballot. For example, major polling companies
in the USA all employ methods to identify “likely voters” in their models to
predict election outcomes. Some pollsters make it as simple as using projected
turnout as a variable in their models which implies that voters across all
parties have the same likelihood of showing up on election day. Others go as
far as including an entire battery of questions to identify which voters are
more likely to show up than others. A closer look at those questions reveals
that almost all US pollsters collect
data on:
- Eligibility to vote
- Registered to vote
- Past voting behaviour
- Level of knowledge about the electoral process
- Level of interest in the current political campaign
- Actual intention to vote in current elections
- Party affiliation
- Demographic information
The
above is a list we could easily find in any first year Political Science
textbook on voting behaviour and elections. This is where subject matter
expertise comes in. Electoral polling doesn’t happen in an isolation chamber
without context. In fact, it happens as a result of five decades of existing
research dedicated to understanding voting behaviour. In this field, early
theories and research focussed heavily on aspects like longstanding political
loyalties that are often based on socio-economic factors (which is why it is
still important to collect demographic information and ask about existing party
affiliation). Over time, research and theories shifted more to explanations
that view voting as a rational choice of individuals who are informed and weigh
the pros and cons of each party and candidate for their own interests. These
theories are reflected in the questions about someone’s level of knowledge of
the electoral process and their interest in the current campaign etc.
Contemporary theories developed by Political Scientists also take into account
institutional factors like systemic barriers to voting, reflecting the
importance to check whether poll respondents are eligible to vote and
registered to vote, as well as probe about past voting behaviour.
In
addition to determining the likelihood of voting, the list above and the
underlying theories can also be used to better understand undecided voters and
incorporate that knowledge into our predictions. For example, someone might be
undecided at the time of a poll, maybe early in the campaign, but has a history
of consistently voting for the same party. Another respondent might identify as
undecided and also score low on the general interest in the current campaign
and states that he does not know where the polling place in their district is.
If we know these things, we can more accurately estimate not only the
likelihood that someone will vote at all, but also how undecided voters should
factor in to the model driving our predictions of the eventual election result.
We
see five decades of Political Science reflected in questions used to predict
voting behaviour. We also see carefully worded questions that likely reflect
years of experience in survey research. For example, Gallup’s
question to gauge voting intent sounds like this: “I'd like you to rate
your chances of voting in November's election for president on a scale of 1 to
10. If 1 represents someone who definitely will not vote and 10 represents
someone who definitely will vote, where on this scale of 1 to 10 would you
place yourself?”. AP-IPSOS used
a similar question: “On November 2nd,
the election for President will be held. Using a 1-to-10 scale, where 10 means
you are completely certain you will vote and 1 means you are completely certain
you will NOT vote, how likely are you to vote in the upcoming presidential
election? You can use any number between 1 and 10, to indicate how strongly you
feel about your likelihood to vote.” RAND’s question was a bit shorter, but no
less concrete: “What is the percent chance that you will vote in the
Presidential election?” Those questions might seem long and clunky, but they
represent the key rules for crafting survey questions. Good questions need to
be clear, concise, and measure only one thing at a time. If the idea is to
measure how likely it is for someone to vote on election day, the question
should be about the likelihood to vote on election day. Granted, there are some
areas where asking a direct question is not the preferred scenario. For
example, survey respondents are unlikely to provide honest answers when asked
about criminal activities or behaviour that is generally deemed socially unacceptable
or undesirable. This is why some pollsters phrase their question about past
voting behaviour like this: “Sometimes things come up and people are not able
to vote. In the 2000 election for President, did you happen to vote?” The
wording is still clear and measures only one thing, but provides respondents
with an option to save face if they did not vote in the past. Sometimes things
come up.
What
is clear when looking at political polling in the US is that polling companies
invest large amounts of time and expertise in crafting their questions and
dealing with the data they derive from it. That doesn’t mean that they are
infallible. In fact, many pollsters in the US got chided for not accurately
predicting a clear
victory for Barack Obama in the 2012 elections. However, those pollsters
were a lot more willing to discuss their shortcomings and methodological
challenges in the aftermath of that election. In British Columbia, we saw an
entirely different approach to both the polling itself and to the discussions
of their failure to accurately predict the election results.
Likely and Undecided Voters in British Columbia
We
know that asking survey respondents whether they will actually make the effort
to show up on election day is a crucial piece in the puzzle of predicting
election results. We looked at the three major polling companies that provided
polls and predictions leading up to the 2013 BC election and examined their
approaches to determining who is a “likely voter”.[1]
Interestingly, none of them asked respondents a direct question about their
intention to vote on election day.
In
the four weeks leading up to the 2013 British Columbia provincial election, Ipsos
Reid conducted four separate polls of BC voters. Each poll consisted of an online panel survey in which a sample
of 800 or more adults was asked to indicate which political party they intend
to vote for. Specifically, IPSOS asked respondents the following: “Thinking of how you feel right now, if a
provincial election were held tomorrow here in BC, which of the following
parties candidates would you be most likely to support, or lean towards?” Similar
questions were asked by EKOS
and Angus Reid
in their polls leading up to the election. At first glance, this might seem appropriate as
a measure of voter intent. However, when comparing this approach to the
approach taken by US polling companies and to the approaches supported by
existing research, the following issues stand out:
- The questions used by all three companies get at party or candidate affiliation, but do not directly determine whether a survey respondent is a) eligible to vote and b) determined to show up on election day.
- Only EKOS appears to also ask questions about respondents’ past voting behaviour, but interestingly, the question used focuses on voting behaviour in the past federal elections. This is problematic, because we know that in general, voter turnout tends to be higher in federal elections than in provincial elections. Someone may well be inclined to vote federally, but has no history or intention to participate in the provincial elections.
- All three polling companies used questions that included the wording "if the election were held tomorrow....", or a variation thereof. Research on voting behaviour has shown that this wording doesn't accurately capture voter intent. In fact, in surveys where respondents are asked to questions, one about a hypothetical election day "tomorrow", and one about the actual election day, results often differ to some extent.
It
is unclear, based on the limited information that is publicly available, what
other sources of information the three polling companies use. They may or may
not include information on demographics, knowledge about the electoral process
or other factors. What is clear from the available information is that all
three companies failed to ask a question similar to those used by all major US
polling companies that focuses solely on someone’s likelihood to turn out.
Lacking this information, polling companies in British Columbia would have had
a difficult time estimating voter turnout and determining the extent to which
stated NDP or Liberal supporters were actually planning to or able to cast a
ballot on election day.
What
about undecided voters? In a way, this is a related issue. As we have mentioned
above, US pollsters use the information they collect about likely voters and
voting intention to predict voting behaviour on election day. Granted, for each
election, there will be a group of truly undecided voters or swing voters with
no clear party affiliation. However, for a certain percentage of the polled
population, it will be possible to use past voting behaviour to predict with
some level of certainty whether and how an undecided voter will vote on election day.
Looking
at the 2013 British Columbia provincial election, we found that all three
polling companies essentially kept undecided voters in a separate group,
treating all of them as true swing voters that could jump to any of the parties
on election day. Using IPSOS polls as an example, the graph below demonstrates
that as time passed between the three polling cycles used by IPSOS, the share
of undecided voters decreased steadily, while the share of Liberal supporters
grew.
The data is clearly telling a story here that
did not make it into the predictions or the media reports prior to the
elections.
Undecided voters in this scenario were likely not truly undecided or swing
voters, but Liberal leaning potential voters. Furthermore, the graph shows a
trend over time that not only refutes the stories of a very large NDP lead
(note the margin of error shown in the graph), but also provides proof that the
final election result was not due to a last minute opinion swing (we
will discuss the margin of error and missed story in the data in much more
detail in our next post). For now, it is important to note here that BC
pollsters were not able to predict the behaviour of undecided voters, and we suggest
that this is because they
did not collect the right information on voting intent, voting history and
other factors that have been shown in the research to greatly influence voting
behaviour.
Can We Trust The Polls?
In
the immediate aftermath of the election, reactions to the inability of the
polls to predict the results in British Columbia focused on one thing: a
massive overnight opinion swing. The day after the election, in an interview
with the Globe and Mail, Angus Reid said “First of all, I don't think the polls
were wrong”. Instead, he noted that pollsters simply missed the late Liberal
surge, a.k.a. massive overnight opinion swing. Later in the interview, he
stated that "The
amount of effort really required to do this properly probably exceeds the
budget these days of media organizations that are used to paying nothing for polls."
Reid’s
statements highlight a key problem: accurate polling requires effort: the
effort to know the research, the effort to ask the right questions and the
effort to listen to the story in the data using appropriate methodologies.
When
looking at the British Columbia polls leading up to the election we can see that people didn’t change
their minds overnight. The trend of Liberal leaning “undecided” voters
supporting Christy Clark’s government was there starting in early April, if not
sooner. What is not as visible in the numbers themselves, but glaringly clear
in the various methodologies of the polling companies is that the pollsters
failed to actually ask their respondents about their likelihood of showing up
on election day. The combination of methodology and Liberal-leaning “undecided”
voters led to a scenario where the predictions not only failed, but ended up
misleading the public.
Electoral polling has consequences. Telling
NDP supporters over months that their party will win the election decisively
might have convinced NDP leaning voters to not turn out on election day. We all
know that it is highly unlikely that our own one ballot will decide an election
(a fact that Political Scientists call the “voting paradox”), so maybe some
people didn’t see the need to invest time and effort to vote if the result is
already clear ahead of time. Additionally, strategic voting may have been
influenced by the polls. Assuming the NDP win was a foregone conclusion might
have led some NDP supporters to cast their ballot for the Green party in their
district. When asked about his polls, Mr. Reid said “I thought it was really a marvellous example of maybe polling at its
finest, in the sense that we and Ipsos and others were saying to the province
of British Columbia in the days leading up to the final vote that there was an
NDP train coming down the track, and that obviously got the attention of a lot
of people and may have actually lulled some of the NDP supporters into thinking
it was a fait accompli.” Polling has consequences.
Many predictions fail. In fact, the whole premise of statistical
analysis is that we produce estimates that are never 100% correct. Instead, we
use past and present data to detect trends and to estimate a level of
confidence that we have about these trends holding true in the future. It is
not a bad thing when predictions fail. It should prompt us to examine our work
and adjust approaches where necessary, rather than assert that people and
behaviours are unpredictable. It is our job to learn from failed predictions
and to improve upon our methods. And that means that we need an open and
transparent discussion about our methodologies and assumptions, as well as a
willingness to connect with others who might be doing similar work. Polling in British
Columbia does not exist in a bubble. We have opportunities to compare our
approaches to those of pollsters in other countries. We also have opportunities
to learn and expand upon our own subject matter expertise, instead of assuming
that it is all just a numbers game. It is not.
Accurate
polling requires effort. Indeed!
[1] It
should be stated clearly here that the three companies do not necessarily make
all of their methodology publicly available. However, all three companies
posted information on their survey questions, their samples and overall
methodology. As such, it is possible to examine their general approaches, even
when a detailed discussion of their modelling is missing.