Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

What is chi-squared distribution?

Khizar Hayat Saani

In terms of hypothesis testing, we use chi-squared distribution to accept or reject a hypothesis.

Using this distribution, we can calculate the error between what we expected and what we actually observed. Once we get this error, we sum it and try to find out if what has been observed is ordinary or not.

Degrees of freedom

degrees of freedom (k) = n - 1

Suppose we have n data points. To calculate the degrees of freedom, we subtract 1 from the number of samples (or a number of categories in the sample) because the last data point can always be predicted if the rest of them are given.

Degrees of freedom is represented by the letter k.

Chi-squared

The following formula can be used to calculate the value for chi-squared (X2X^2).

X2=i=1n(OiEi)2EiX^2 = \sum_{i=1}^n \frac{(O_i - E_i)^2}{E_i}

  • Oi:O_i: Observed value
  • Ei:E_i: Expected value

Example

Suppose you have a test coming up, and your instructor says that all options (A, B, C, D) in the MCQ section of the test have an equal probability of being the right answer.

In other words, there is no bias between your choices. Since it is your mathematics paper and you have already practiced from the given resources, you investigate whether your teacher is right or not.

First, you take a sample of 100 MCQs:

Correct Choice

Expected Number of Choices

Observed Number of Choices

A

0.25 * 100 = 25

20

B

0.25 * 100 = 25

20

C

0.25 * 100 = 25

25

D

0.25 * 100 = 25

35

We will assume a significance level of 5% or 0.05

The above table represents that all choices have an equal probability of being right, and the expected number of choices column should reflect that.

But when we actually calculate the frequency with which each option was correct, we see some discrepancy. One might be tempted to say right away that D is the most frequently occurring correct choice. We must investigate further and apply chi-squared distribution to get to the bottom of things.

After sampling, we calculate k, which will be 4 – 1. Now that we know k, we have to calculate the chi-squared. This number will quantify how much the actual results are different from the observed results.

X2=(2025)225+(2025)225+(2525)225+(3525)225=6.0X^2 = \frac{(20-25)^2}{25} + \frac{(20-25)^2}{25} + \frac{(25-25)^2}{25} + \frac{(35-25)^2}{25} = 6.0

Now, you have to figure out the probability of getting a number that is 6 or more, with 3 degrees of freedom. For that, we refer to the table below.

Degrees of Freedom

.10

.05

.025

.01

.005

1

2.706

3.841

5.024

6.635

7.879

2

4.605

5.991

7.378

9.210

10.597

3

6.251

7.815

9.348

11.345

12.838

4

7.779

9.488

11.143

13.277

14.860

5

9.236

11.070

12.832

15.086

16.750

6

10.645

12.592

14.449

16.812

18.548

This table shows that the probability of getting a number greater than or equal to 6 with 3 degrees of freedom is 10%. That is greater than our given significance value of 5%.

Since the probability of getting X26X^2 \geq 6 is 0.10, we cannot reject the initial hypothesis. You cannot reject your instructor’s statement that all options have an equal probability of being correct.

RELATED TAGS

CONTRIBUTOR

Khizar Hayat Saani
Copyright ©2022 Educative, Inc. All rights reserved
RELATED COURSES

View all Courses

Keep Exploring