Related Tags

# What is chi-squared distribution?

Khizar Hayat Saani

In terms of hypothesis testing, we use chi-squared distribution to accept or reject a hypothesis.

Using this distribution, we can calculate the error between what we expected and what we actually observed. Once we get this error, we sum it and try to find out if what has been observed is ordinary or not.

## Degrees of freedom

degrees of freedom (k) = n - 1

Suppose we have n data points. To calculate the degrees of freedom, we subtract 1 from the number of samples (or a number of categories in the sample) because the last data point can always be predicted if the rest of them are given.

Degrees of freedom is represented by the letter k.

## Chi-squared

The following formula can be used to calculate the value for chi-squared ($X^2$).

$X^2 = \sum_{i=1}^n \frac{(O_i - E_i)^2}{E_i}$

• $O_i:$ Observed value
• $E_i:$ Expected value

## Example

Suppose you have a test coming up, and your instructor says that all options (A, B, C, D) in the MCQ section of the test have an equal probability of being the right answer.

In other words, there is no bias between your choices. Since it is your mathematics paper and you have already practiced from the given resources, you investigate whether your teacher is right or not.

First, you take a sample of 100 MCQs:

 Correct Choice Expected Number of Choices Observed Number of Choices A 0.25 * 100 = 25 20 B 0.25 * 100 = 25 20 C 0.25 * 100 = 25 25 D 0.25 * 100 = 25 35

We will assume a significance level of 5% or 0.05

The above table represents that all choices have an equal probability of being right, and the expected number of choices column should reflect that.

But when we actually calculate the frequency with which each option was correct, we see some discrepancy. One might be tempted to say right away that D is the most frequently occurring correct choice. We must investigate further and apply chi-squared distribution to get to the bottom of things.

After sampling, we calculate k, which will be 4 – 1. Now that we know k, we have to calculate the chi-squared. This number will quantify how much the actual results are different from the observed results.

$X^2 = \frac{(20-25)^2}{25} + \frac{(20-25)^2}{25} + \frac{(25-25)^2}{25} + \frac{(35-25)^2}{25} = 6.0$

Now, you have to figure out the probability of getting a number that is 6 or more, with 3 degrees of freedom. For that, we refer to the table below.

 Degrees of Freedom 0.1 0.05 0.025 0.01 0.005 1 2.706 3.841 5.024 6.635 7.879 2 4.605 5.991 7.378 9.21 10.597 3 6.251 7.815 9.348 11.345 12.838 4 7.779 9.488 11.143 13.277 14.86 5 9.236 11.07 12.832 15.086 16.75 6 10.645 12.592 14.449 16.812 18.548

This table shows that the probability of getting a number greater than or equal to 6 with 3 degrees of freedom is 10%. That is greater than our given significance value of 5%.

Since the probability of getting $X^2 \geq 6$ is 0.10, we cannot reject the initial hypothesis. You cannot reject your instructor’s statement that all options have an equal probability of being correct.

RELATED TAGS

CONTRIBUTOR

Khizar Hayat Saani