See the Layers
Explore advanced visualization techniques to uncover patterns and relationships in multi-dimensional data.
We'll cover the following...
In our last lesson, we learned to design powerful visuals with clarity and empathy, ensuring our message truly resonates. We’ve familiarized ourselves with creating basic charts and understanding data distributions. Now, we’re ready to tackle the real-world, where data rarely comes with just one or two simple measurements. Most often, our datasets are rich with many variables, creating a multi-dimensional puzzle.
According to research by IBM, poor data visualizations and miscommunication cost U.S. businesses up to $3.1 trillion annually due to lost productivity and bad decisions.
In this lesson, we will explore advanced visualization techniques. These tools allow us to represent and explore complex relationships in the data. They’re especially useful when we’re dealing with more than just a couple of dimensions. It’s about seeing the full picture, with all its intricate layers.
Adding more variables to our view
We primarily focused on charts that show one or two variables at a time. While powerful, many real-world problems involve understanding how three, four, or even dozens of factors interact, data analysts use clever techniques to add these extra “dimensions” to our visuals. These, in turn, allow us to uncover richer insights.
Seaborn is a Python library built on top of Matplotlib that makes it easy to create visually engaging and informative charts with less code. It works great with pandas DataFrames, offers built-in themes, and supports advanced plots like heatmaps, pair plots, and faceted visuals. This is perfect for analyzing multi-dimensional data.
Colored and bubbled scatter plots
We are already familiar with the basic scatter plot, which shows the relationship between two numerical variables (one on the x-axis, one on the y-axis). To add a third dimension, we can introduce color. Imagine a scatter plot showing customer age vs. purchase amount. If we then color each point based on the customer’s region (e.g., blue for North, red for South), we can instantly see if customers from certain regions tend to be older or spend more.
To add a fourth dimension, we can use size, transforming our points into “bubbles.” For instance, in our age vs. purchase plot, we could make the size of each bubble represent the customer’s loyalty score. A large, black bubble would instantly tell us: an older, highly loyal customer from a specific region with a high purchase amount. This allows us to encode more information into a single, comprehensive visual.
Fun fact: The idea of using color and size to add dimensions to scatter plots goes back decades, long before computers made them easy to create. Early statisticians would manually plot and then hand-color points to reveal patterns.
Code example
In this example, we’ll use a bubble plot to show how a customer’s age relates to their purchase amount, while also encoding region with color and loyalty score with bubble size. This adds immediate depth and meaning to the visual.
import matplotlib.pyplot as pltimport seaborn as snsimport pandas as pd# Sample datadf = pd.DataFrame({'Age': [25, 34, 45, 23, 40],'PurchaseAmount': [200, 450, 300, 150, 500],'Region': ['North', 'South', 'East', 'West', 'North'],'LoyaltyScore': [80, 60, 90, 70, 85]})# Bubble plot with color and sizeplt.figure(figsize=(8, 6))scatter = sns.scatterplot(data=df,x='Age',y='PurchaseAmount',hue='Region',size='LoyaltyScore',sizes=(50, 500),palette='viridis',alpha=0.7)plt.title('Customer Age vs Purchase Amount')# Move legend outside the plotplt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0)plt.tight_layout() # Adjusts plot to make space for legendplt.show()
In the above example:
Lines 1–3: Import necessary libraries:
matplotlib
for plotting,seaborn
for styled visualizations, andpandas
for data handling.Lines 6–11: Create a small DataFrame with customer information like
age
,purchase amount
,region
, and ...