...

/

Connect the Dots

Connect the Dots

Learn how to explore and visualize relationships between variables in the data using bivariate analysis techniques.

We explored individual variables: looking at their distributions, central tendencies, and spread. That gave us a solid understanding of what each variable looks like. But in the real world, variables often interact together. To answer questions like:

  • Does more time on a website lead to more purchases?

  • Is income influenced by education level?

  • Do taller people tend to weigh more?

We need to move beyond solo stats and explore how variables interact. This brings us to bivariate analysis, which helps us understand relationships between variables.

Bivariate analysis

When working with datasets, we often must examine how two variables relate. This is known as a bivariate relationship. Understanding these relationships helps us to uncover meaningful patterns, identify key associations, and generate insights that inform decisions.

Press + to interact
Analyze two variables
Analyze two variables

Fun fact: Think of bivariate analysis as playing matchmaker for our data! We’re trying to see if two variables are a “good fit.”

The type of analysis and the tools we use depend on the kinds of variables we’re comparing, whether they’re numeric or categorical.

This section will discuss different types of bivariate relationships, how to summarize them with appropriate statistics, and how to interpret them visually.

We’ll begin with the simplest case: when both variables are numeric.

Numeric vs. numeric

When both variables in our dataset are numeric, we’re often interested in whether a change in one variable corresponds to a change in the other. These relationships are fundamental in data analysis because they help us understand how two continuous measurements vary.

For example, we might ask: if someone is taller, do they also tend to weigh more? If a company increases its marketing budget, do its sales improve? This type of relationship is at the heart of what’s known as bivariate numerical analysis.

To study this, we typically start with two tools: numerical summarization and visual inspection.

In visual inspection, we’ll focus on interpreting the plots; coding will come later in the “Visualization and Storytelling” chapter.

Quantifying the relationship

In data analysis, one of the most fundamental questions we ask is: To what extent do two numeric variables move together? This question lies at the heart of generating insights, identifying behavioral drivers, and revealing hidden patterns in the data. The Pearson correlation coefficient offers a precise, quantitative answer. It measures the strength and direction of a linear relationship between two continuous variables, helping analysts interpret associations with clarity and confidence.

Pearson correlation coefficient (r)

The Pearson correlation coefficient (denoted as r) quantifies the strength and direction of a linear relationship between two numeric variables. Its value always falls between 1–1 and +1+1:

Where:

  • xix_i, yiy_i are the individual data points.

  • xˉ\bar{x}, yˉ\bar{y} are the means of the two variables.

  • sxs_x​,sys_y​ ...