...

/

Regression Confession

Regression Confession

Learn how regression tests explain relationships, and predict outcomes in data.

As data analysts, we often try to make sense of patterns in data and explain them in a way others can understand. People will ask questions like: Does income affect purchases? Does battery size predict phone longevity?

We need a reliable method to move beyond simply describing trends and actually test relationships. This is where a regression test comes in.

What is a regression test?

A regression test is a statistical approach that helps us:

  • Model relationships between variables.

  • Measure the strength and direction of those relationships.

  • Test hypotheses about cause and effect.

  • Predict future outcomes based on past data.

Instead of relying on assumptions or intuition, regression lets us test claims like: “Does income significantly influence purchase decisions?”

Types of regression tests

Different regression tests are suited for different kinds of outcomes. The test we choose depends on the type of variable we’re trying to predict.

Regression Type

Description

Target Variable

Common Use Cases

Linear regression

Tests linear relationships with numeric outcomes

Continuous (e.g., price)

Predicting sales, income, weights

Logistic regression

Tests probability of binary outcomes

Binary (e.g., 0/1)

Churn prediction, marketing conversion

Multiple regression

Tests multiple inputs at once

Continuous

Controlling for multiple factors

Poisson regression

Tests relationships with count-based targets

Count (e.g., clicks)

Website visits, event occurrences

Ordinal regression

Tests effects on ranked outcomes

Ordered categories

Customer satisfaction (low, medium, high)

In this lesson, we’ll focus on the two most useful regression tests for data analysts: linear regression and logistic regression.

Linear regression

Linear regression is one of the most useful tools when we’re working with numeric outcomes. It’s especially helpful when the variable we want to predict is continuous, like price, revenue, or performance score. It also helps us understand how much each factor contributes to the result.

As analysts, we’re often tasked with more than just reporting values. We’re expected to explain why something happened and what might happen next. That’s where regression tests shine. They let us model relationships between dependent and independent variables, measure how strong those relationships are, and test whether the effects we observe are real or just random noise.

Example

We have data for 15 houses, including their square footage, number of rooms, and the final sale price. Our goal is to understand how square footage and number of rooms together influence the house price. To do this, we set up a statistical hypothesis test that helps us evaluate whether the observed relationships are meaningful or just due to chance:

  • Null hypothesis (H₀): There is no significant relationship between square footage, number of rooms, and house price.

  • Alternative hypothesis (H₁): Square footage and number of rooms significantly influence the house price.

Press + to interact
Python
Files
import pandas as pd
import statsmodels.api as sm
# Load dataset from CSV
df = pd.read_csv("house_prices.csv")
# Define independent variables
X = df[['Square_Feet', 'Rooms']]
X = sm.add_constant(X) # Add intercept
# Define dependent variable
y = df['Price']
# Fit the regression model
model = sm.OLS(y, X)
results = model.fit()
# Show summary
print(results.summary())
  • Line 5: Reads the CSV file named house_prices.csv and loads it into a DataFrame called df.This file should contain columns like Square_Feet, Rooms, and Price.

  • Line 8: Selects the independent (predictor) variables from the DataFrame: Square_Feet and Rooms.These will be used to predict the price.

  • Line 9: Adds a constant term (a column of 1s) to the predictor variables.This represents the intercept (β0β_0​ ...