Test Your Thinking

Learn how data analysts use hypothesis testing, t-tests, and ANOVA to make evidence-based conclusions.

In analysis, data alone doesn’t tell the full story. We might observe patterns or differences in numbers; maybe one group appears to perform better than another, or two variables seem to move together. But how do we know whether these patterns are real or just due to random chance?

This is where statistical testing becomes essential. It allows us to evaluate data objectively and make evidence-based decisions about the relationships and differences we observe.

Statistical tests

A statistical test is a mathematical procedure used in hypothesis testing to evaluate whether a relationship exists between variables or whether differences between groups are meaningful. These tests help us assess whether the evidence in our sample data is strong enough to draw conclusions about the broader population.

Fun fact: Statistical tests are like lie detectors for data; they tell us if a pattern is a real ‘truth’ or just a ‘coincidence.’

In essence, statistical tests allow us to test a claim (hypothesis) by quantifying the likelihood that the observed results are due to chance.

What is hypothesis testing?

Hypothesis testing is a structured approach used to evaluate whether the data provides sufficient evidence to support a claim about a population. It helps determine whether an observed effect or relationship in the sample data is likely to exist in the population or if it could have occurred by random chance.

Hypothesis testing follows a sequence of steps:

Press + to interact

Null hypothesis (H₀): The default assumption is that there is no effect, no difference, or no relationship between groups or variables.
Example: A new drug has no impact on blood pressure compared to the current standard treatment.
Alternative hypothesis (H₁): The research hypothesis proposes that an effect, difference, or relationship exists.
Example: The new drug significantly reduces blood pressure compared to the standard treatment.

Informational note: H₀ vs. H₁—always remember that they are mutually exclusive. We’re trying to find enough evidence to reject H₀, not necessarily prove H₁.

Step 2: Choose an appropriate statistical test

The choice of statistical test depends on:

The type of data (categorical vs. numerical).
The number of groups or variables involved.
Whether data meet assumptions for parametric testsParametric tests are statistical tests that assume the data follows a specific distribution, usually a normal distribution. These tests also assume that the data meet certain conditions, such as equal variances across groups. Examples of parametric tests include t-tests, ANOVA, and linear regression. (e.g., normality, equal variances).

Common tests include t-tests, ANOVA, regression tests, etc. and we’ll explore each of these in more detail in the upcoming sections.

Fun fact: Picking the right statistical test is like choosing the perfect key for a lock; the wrong one won’t open our data’s secrets!

Step 3: Set a significance level ( $α$ )

Before analyzing data, set the significance level ( $α$ ). This is the threshold for deciding if a result is statistically significant. The standard value is $α = 0.05$ .

If the probability of observing the result under the null hypothesis is less than $α$ , we consider the finding statistically significant.

Informational note: This is our personal risk tolerance for being wrong. A common α=0.05 means we’re okay with a 5% chance of rejecting a true null hypothesis (a “false positive”).

Step 4: Calculate the test statistic

Using the selected test, compute a test statistic, that is a numerical summary that measures how far the observed data deviates from what the null hypothesis predicts.

Examples of test statistics include:

t-statistic for t-tests.
F-statistic for ANOVA.
χ² statistic for chi-square tests.
z-statistic for z-tests.
r statistic for correlation tests.

A larger test statistic generally indicates stronger evidence against the null hypothesis.

Step 5: Determine the p-value

The p-value quantifies the probability of obtaining a test statistic as extreme as (or more extreme than) the observed one, assuming the null hypothesis is true.

Low p-value (≤ α): Strong evidence against H₀, so we reject the null hypothesis.
High p-value (> α): ...

Step into Data Analysis

Talk to Data

Clean It Up!

Making Sense Out of Data

Visualization and Storytelling

Conclusion

Test Your Thinking

Statistical tests

What is hypothesis testing?

Step 1: State the hypothesis

Step 2: Choose an appropriate statistical test

Step 3: Set a significance level ( $α$ )

Step 4: Calculate the test statistic

Step 5: Determine the p-value

Test Your Thinking

Statistical tests

What is hypothesis testing?

Step 1: State the hypothesis

Step 2: Choose an appropriate statistical test

Step 3: Set a significance level (ααα)

Step 4: Calculate the test statistic

Step 5: Determine the p-value

Step 3: Set a significance level ( $α$ )