Paint the Picture
Learn to visualize a single variable using bar charts, box plots, and histograms.
We'll cover the following...
We explored key statistics that describe a single variable—its center, spread, and shape. While these numerical summaries give us important insights, they can sometimes feel abstract and tough to interpret quickly. This time, we’ll bring those numbers to life with visualizations that make the data’s story clear and intuitive.
In this lesson, we’ll focus on simple yet powerful charts that reveal the distribution and characteristics of individual variables. These visuals help us spot patterns, uncover outliers, and notice subtle details that raw numbers might hide. Mastering these univariate visuals, we can better understand and communicate each variable’s unique story.
Picking the right univariate chart
Visualizing a single variable effectively depends on understanding its type and what aspect we want to explore. Numerical variables invite charts revealing their distribution, spread, and extremes, while categorical variables are best shown with charts highlighting category frequencies.
Choosing the right chart helps us unlock insights quickly and communicate data stories. Let’s explore three common univariate visualization types—each paired with practical code examples so we can try them out as we learn.
1. Histogram
A histogram is a powerful way to visualize the distribution of a continuous numerical variable. It helps us see where the data points are concentrated, whether the data is skewed, and how spread out the values are.
What a histogram does:
Divides the range of data into intervals called bins.
Counts how many data points fall into each bin.
Displays this count as bars, showing the frequency distribution visually.
Syntax
We use the plt.hist()
function from the Matplotlib library to create a histogram in Python. This function takes several parameters that control how the histogram looks and behaves.
Here’s the general syntax:
plt.hist(x, bins=None, range=None, density=False, histtype='bar', color=None, label=None)
Common parameters:
x
: The numerical data we want to visualize. This is usually a list, array, or pandas Series of numbers.bins
(Optional): The number of intervals (bins) we want to divide our data into. If not specified, Matplotlib chooses a default number(10).range
(Optional): The lower and upper range of the bins as a tuple(min, max)
. Data outside this range is ignored.density
(Optional): IfTrue
, the histogram displays probabilities instead of counts (useful for comparing distributions).histtype
(Optional): The type of histogram. The default is'bar'
, but other types include'step'
,'stepfilled'
, etc.color
(Optional): Color of the bars.label
(Optional): Label for the data series (used when adding ...