Anscombe’s Quartet and Failures of the Correlation Coefficient

As we noted earlier, the correlation coefficient does not pick up any non-linear relationship in the data, and it is heavily influenced by outliers. This can be best illustrated using Anscombe’s quartet, referring to the four datasets with 11 observations each, constructed by Francis Anscombe in 1973. Anscombe illustrated that across the four artificial datasets, the two variables produce an identical correlation coefficient. But when displayed in a scatter plot, the relationship between the two variables appears to differ dramatically among the four datasets.

