Comparing bootstrap and sampling distributions

Let’s talk more about the relationship between sampling distributions and bootstrap distributions.

Recall that earlier we took 1,000 virtual samples from the bowl using a virtual shovel, computed 1,000 values of the sample proportion of red, and then visualized their distribution in a histogram. Recall also that this distribution is called the sampling distribution of $\hat{p}$ . Furthermore, the standard deviation of the sampling distribution has a special name, and that’s the standard error.

We also mentioned that this sampling activity doesn’t reflect how sampling is done in real life. Rather, it was an idealized version of sampling so that we could study the effects of sampling variation on estimates, like the proportion of the shovel’s balls that are red. In real life, however, one would take a single sample that’s as large as possible, much like in the Obama poll we saw previously. However, how can we get a sense of the effect of sampling variation on estimates if we only have one sample and therefore only one estimate? Don’t we need many samples and therefore many estimates?

The workaround to having a single sample was to perform bootstrap resampling with replacement from the single sample. We did this in the resampling activity where we focused on the mean year of the minting of pennies. We used pieces of paper representing the original sample of 50 pennies from the bank and resampled them with replacement from a hat. We had 35 of our friends perform this activity and visualized the resulting 35 sample means $\bar{x}$ in a histogram.

This distribution was called the bootstrap distribution of ...

Getting Started with Data in R

Data Visualization

Data Wrangling

Data Importing and “Tidy” Data

Basic Regression

Multiple Regression

Statistical Inference with the infer Package

Bootstrapping and Confidence Intervals

Hypothesis Testing

Inference for Regression

Price Prediction With Regression Analysis in R

Tell a Story with Data

Appendix

Uber Data Analysis Using the R Language

Recap: Bootstrapping and Confidence Intervals

Comparing bootstrap and sampling distributions