Beyond Basic Visuals
Learn to create histograms, candlestick plots, heatmaps, and stack plots to uncover data distributions, outliers, and complex relationships.
In our previous lesson, we opened ourselves up to the world of data visualization, learning to tell basic stories with line, bar, scatter, and pie charts. Now, we’re ready to go deeper. Imagine not just seeing the outline of a character in our data story, but truly understanding their personality, their range, and how they interact with others.
In this lesson, we will explore powerful visualization techniques that help us understand the distribution of our data and the intricate relationships between variables, pushing our storytelling abilities beyond the basics.
Understanding data distribution
When we talk about “data distribution,” we’re asking: How are our values spread out? Are they clustered together? Do they lean toward one side? Are there any extreme values? Understanding distribution is essential to making sense of data.
Histogram
A histogram is a special kind of bar chart that helps us understand the distribution of a single numerical variable. Instead of showing categories, it groups data into “bins” (ranges) and then shows us how many data points fall into each bin. Think of it like sorting people by height into different height groups, and then counting how many people are in each group.
When we look at a histogram, we can quickly see where most of our data lies and observe its shape, noting if it’s spread evenly or skewed to one side. This makes it an invaluable tool for our univariate analysis, giving us a visual sense of central tendency and spread.
Histograms are fundamental. They were first introduced by Karl Pearson, a pioneer in mathematical statistics, in the late 19th century. They remain one of the most effective ways to summarize the shape of continuous data.
Creating a histogram in Google Sheets
We can create a histogram in Google Sheets by selecting our data and choosing the histogram chart option. The chart will automatically visualize the distribution of our numerical data.
Let’s walk through the steps with a practical example.
We'll arrange our data in a single column (
Weekly study hours
) to show how study hours are distributed among learners.