Search⌘ K
AI Features

Connect the Dots

Explore how two variables relate by mastering bivariate analysis techniques such as Pearson correlation, scatterplots, and simple linear regression in Google Sheets. Understand numeric, categorical, and mixed variable relationships to uncover patterns, compare groups, and build predictive models for stronger data insights.

We explored individual variables: looking at their distributions, central tendencies, and spread. That gave us a solid understanding of what each variable looks like. But in the real-world, variables often interact with each other. To answer questions like:

  • Does more time on a website lead to more purchases?

  • Is income influenced by education level?

  • Do taller people tend to weigh more?

We need to move beyond solo stats and explore how variables interact. This brings us to bivariate analysis, which helps us understand relationships between variables.

Bivariate analysis

When working with datasets, we often must examine how two variables relate. This is known as a bivariate relationship. Understanding these relationships helps us uncover meaningful patterns, identify key associations, and generate insights that inform decisions.

Analyze two variables
Analyze two variables

The type of analysis depends on the nature of the variables, whether they are numeric, categorical, or a mix of both. In this section, we’ll look at the main types of bivariate relationships, see how to summarize them with the right statistics, and learn how to interpret them visually.

Let’s start with the simplest case: when both variables are numeric.

Numeric vs. numeric

When both variables in our dataset are numeric, we’re often interested in whether a change in one variable corresponds to a change in the other. These relationships are fundamental in data analysis, because they help us understand how two continuous measurements vary.

For example, we might ask: if someone is taller, do they also tend to weigh more? If a company increases its marketing budget, do its sales improve? This type of relationship is at the heart of what’s known as bivariate numerical analysis.

To study this, we typically start with two tools: numerical summarization and visual inspection.

In visual inspection, we’ll focus on interpreting the plots; generating them in Google Sheets will come later in the “Show What You Found” chapter.

Quantifying the relationship

In data analysis, one of the most fundamental questions we ask is: To what extent do two numeric variables move together? This question lies at the heart of generating insights, identifying behavioral drivers, and revealing hidden patterns in the data. The Pearson correlation coefficient offers a precise, quantitative answer. It measures the strength and direction of a linear relationship between two continuous variables, helping analysts interpret associations with clarity and confidence.

Pearson correlation coefficient (r)

The Pearson correlation coefficient (r) measures both the strength and direction of the linear relationship between two numeric variables. In this context, we usually think of one as the independent variable (the factor we believe may be influencing the other) and the second as the dependent variable, which reflects the outcome or response being influenced. Its value always falls between 1–1 and +1+1:

Where:

  • xix_i, yiy_i are the individual data points.

  • xˉ\bar{x} ...