Case Study: Polls

Check out a case study related to voting polls.

Let’s now switch gears to a more realistic sampling scenario than our bowl activity—a poll. In practice, pollsters don’t take 1,000 repeated samples, but rather take only a single sample that’s as large as possible.

On December 4, 2013, National Public Radio in the US reported on a poll of President Obama’s approval rating among young US citizens aged 18–29 in an article, “Poll: Support For Obama Among Young Americans Eroding.” A quote from the article stated:

“After voting for him in large numbers in 2008 and 2012, young Americans are souring on President Obama. According to a new Harvard University Institute of Politics poll, just 41 percent of millennials—adults aged 18–29—approve of Obama’s job performance, his lowest-ever standing among the group and an 11-point drop from April.”

Let’s tie elements of the real-life poll in this new article with our tactile and virtual bowl activity using the terminology, notations, and definitions we learned previously. We’ll see that our sampling activity with the bowl is an idealized version of what pollsters are trying to do in real life.

First, who is the (study) population of NN individuals or observations of interest?

  • Bowl:𝑁𝑁= 2,400 identically sized red and white balls

  • Obama poll:𝑁𝑁= ? young US citizens aged 18–29

Second, what’s the population parameter?

  • Bowl: The population proportion pp of all the balls in the bowl that are red

  • Obama poll: The population proportion pp of all young US citizens who approve of Obama’s job performance

Third, what would a census look like?

  • Bowl: Manually going over all NN = 2400 balls and exactly computing the population proportion pp of the balls that are red

  • Obama poll: Locating all NN young US citizens and asking them all if they approve of Obama’s job performance; in this case, we don’t even know what the population size NN is!

Fourth, how do we perform sampling to obtain a sample of size nn?

  • Bowl: Using a shovel with nn slots

  • Obama poll: One method is to get a list of phone numbers of all young US citizens and pick out nn phone numbers; in this poll’s case, the sample size of this poll was nn = 2089 young US citizens

Fifth, what’s our point estimate (aka sample statistic) of the unknown population parameter?

  • Bowl: The sample proportion p^\hat{p} of the balls in the shovel that are red

  • Obama poll: The sample proportion p^\hat{p} of young US citizens in the sample that approve of Obama’s job performance; in this poll’s case, p^\hat{p} = 0.41 = 41%, the quoted percentage in the second paragraph of the article

Sixth, is the sampling procedure representative?

  • Bowl: Are the contents of the shovel representative of the contents of the bowl? We mixed the bowl before sampling, therefore, we can feel confident that they are.

  • Obama poll: Is the sample of nn= 2089 young US citizens representative of all young US citizens aged 18–29? This depends on whether the sampling is random.

Seventh, are the samples generalizable to the greater population?

  • Bowl: Is the sample proportion p^\hat{p} of the shovel’s balls that are red a good guess of the population proportion pp of the bowl’s balls that are red? Given that the sample is representative, the answer is yes.

  • Obama poll: Is the sample proportion p^\hat{p} = 0.41 of the sample of young US citizens who supported Obama a good guess of the population proportion pp of all young US citizens who supported Obama at this time in 2013? In other words, can we confidently say that roughly 41% of all young US citizens approved of Obama at the time of the poll? Again, this depends on whether the sampling is random.

Eighth, is the sampling procedure unbiased? In other words, do all observations have an equal chance of being included in the sample?

  • Bowl: Each ball is equally sized and we mix the bowl before using the shovel, therefore, each ball has an equal chance of being included in a sample and therefore, the sampling is unbiased.

  • Obama poll: Did all young US citizens have an equal chance at being represented in this poll? Again, this depends on whether the sampling is random.

Ninth and lastly, is the sampling done at random?

  • Bowl: As long as we mix the bowl sufficiently before sampling, our samples will be random.

  • Obama poll: Is the sample conducted at random? We can’t answer this question without knowing about the sampling methodology used by the Kennedy School’s Institute of Politics at Harvard University.

In other words, the poll by Kennedy School’s Institute of Politics at Harvard University can be thought of as an instance of using the shovel to sample balls from the bowl. Furthermore, if another polling company conducts a similar poll of young US citizens at roughly the same time, they will likely get a different estimate than 41%. This is due to sampling variation.

In general

If the sampling of a sample of size nn is done at random, then the sample is unbiased and representative of the population of size 𝑁. Therefore, any result based on the sample can be generalized to the population. The point estimate is a good guess of the unknown population parameter. This means that instead of performing a census, we can infer about the population using sampling.

Specific to the bowl

If we extract a sample of nn = 50 balls at random, i.e., we mix all of the equally sized balls before using the shovel, then the contents of the shovel are an unbiased representation of the contents of the bowl’s 2,400 balls. Therefore, any result based on the shovel’s balls can be generalized to the bowl, and the sample proportion p^\hat{p} of the nn= 50 balls in the shovel that are red is a good guess of the population proportion pp of the𝑁𝑁 = 2,400 balls that are red. This means that instead of manually going over all 2,400 balls in the bowl, we can infer about the bowl using the shovel.

Specific to the Obama poll

If we had a way of contacting a randomly chosen sample of 2,089 young US citizens and polling their approval of President Obama in 2013, then these 2,089 young US citizens would be an unbiased and representative sample of all young US citizens in 2013. Moreover, any results based on this sample of 2,089 young US citizens can be generalized to the entire population of all young US citizens in 2013. The reported sample approval rating of 41% of these 2,089 young US citizens is a good guess of the true approval rating among all young US citizens in 2013. This means that instead of performing an expensive census of all young US citizens in 2013, we can infer about all young US citizens in 2013 using polling.

Therefore, it‘s critical for the sample obtained by the Institute of Politics to be truly random in order to infer about all the young US citizens’ opinions about Obama. Is their sample truly random? It’s hard to answer such questions without knowing about the sampling methodology they used. For example, if this poll was conducted using only mobile phone numbers, people without mobile phones would be left out and therefore, not be represented in the sample. What about if the Institute of Politics conducted this poll on an internet news site? Then people who don’t read this particular internet news site would have been left out. Ensuring that our samples are random is easy to do in our sampling bowl exercises. However, in a real-life situation like the Obama poll, this is much harder to do.

Get hands-on with 1200+ tech skills courses.