Connect the Dots
Learn how to explore and visualize relationships between variables in the data using bivariate and multivariate analysis techniques.
We'll cover the following...
We explored individual variables, examining their distributions, central tendencies, and spread. This gave us a solid understanding of what each variable looks like on its own.
But in the real world, variables often interact together.
To answer questions like:
Does more time on a website lead to more purchases?
Is income influenced by education level?
Do taller people tend to weigh more?
We need to move beyond solo stats and explore how variables interact. This brings us to bivariate analysis, which helps us understand relationships between variables.
Bivariate analysis
When working with datasets, we often must examine how two variables relate. This is known as a bivariate relationship. Understanding these relationships helps us explore patterns, uncover associations, and build better predictive models. The type of analysis and the tools we use depend on the kinds of variables we’re comparing—whether they’re numeric or categorical.
In this section, we’ll walk through different types of bivariate relationships, how to summarize them with appropriate statistics, and how to visualize them. We’ll begin with the simplest case: when both variables are numeric.
Numeric vs. numeric
When both variables in our dataset are numeric, we’re often interested in whether a change in one variable corresponds to a change in the other. These relationships are fundamental in data science because they help us understand how two continuous measurements vary.
For example, we might ask: if someone is taller, do they also tend to weigh more? If a company increases its marketing budget, do its sales improve? This type of relationship is at the heart of what’s known as bivariate numerical analysis.
To study this, we typically start with two tools: numerical summarization and visual inspection.
Quantifying the relationship
In data analysis, one of the most fundamental questions we ask is: To what extent do two numeric variables move together?
This inquiry is central to building predictive models, identifying drivers of behavior, and uncovering underlying mechanisms in data. The Pearson correlation coefficient provides a formal, numerical answer to this question by measuring the strength and direction of a linear relationship between two continuous variables.
Pearson correlation coefficient (r)
The Pearson correlation coefficient (r) quantifies the strength and direction of a linear relationship between two numeric variables. Its value always falls between
Where:
, are the individual data points, ...