Boxplots
Explore how boxplots visually summarize numerical data distributions using the five-number summary and interquartile range. Learn to create and interpret side-by-side boxplots in R with ggplot2 to compare data across categories and identify outliers.
We'll cover the following...
While faceted histograms are one type of visualization used to compare the distribution of a numerical variable split by the values of another variable, another type of visualization that achieves this same goal is a side-by-side boxplot. A boxplot is constructed from the information provided in the five-number summary of a numerical variable.
Five-number summary
The five-number summary consists of five summary statistics: the minimum, the first quartile (25th percentile), the second quartile (median or 50th percentile), the third quartile (75th percentile), and the maximum.
The quartiles are calculated as:
The first quartile (
): The median of the first half of the sorted data The third quartile (
...