What is a t-test?

Share

A t-test is a kind of inferential statistic that is used for hypothesis testing, so that we can test assumptions that are applicable to a population. For example, if we conduct a happiness study in a country with different age groups. We can use the t-test to determine whether people of different age groups are similar to each other or whether happiness differs across age groups in the country.

T-test types and requirements

To perform this test, we require three key metrics. The metrics are mean value difference, the standard deviation of each sample, and the total frequency of data values of each sample. The t-test requires the data we are working with to be a normal distribution (a bell-shaped curve).

When performing a t-test, the t-score tells us how similar or different two groups are. The larger the score the greater the difference and vice versa.

There are three kinds of t-tests:

  1. One sample t-test
  2. Paired (dependant) sample t-test
  3. Independent sample t-test

One sample t-tests

A one sample t-test is used when we have a known mean of a population and can use it to compare the mean sample against the known value. This can be done through the formula given below:

t=mμs/nt = \frac{m - \mu}{s / \sqrt{n}}

where,

  • m = mean of the sample
  • n  = size of the sample
  • s  = standard deviation of the sample, with n-1 degrees of freedom
  • μ = the given mean of the population/sample

Below is an example of calculating a one-sample t-test in R. The mean happiness index of a country is given. In this case, it is 7.5. The survey data for a sample of teenager's data has also been preloaded. We will be using the t.test function, built-in R, to calculate the t-score of the teenager's data against the given value.

#Visualize the data summary
summary(teenagers$happiness)
# One-sample t-test
ttest <- t.test(teenagers$happiness, mu = 7.5)
# Printing the results
ttest
Results

We can observe that the average mean of teenagers' happiness differs from the average happiness index across the country since our t-value is quite high.

Code explanation

Line 2 : We use a summary function. This takes in data as its only argument and provides us with some basic statistics of the data.

Line 4: To calculate a one-sample t-test, the function t.test takes in two arguments, the data and the given mean, and returns the summary of the t-test performed on the values provided.

Paired (dependent) samples t-test

This is used to compare the means of samples with similar characteristics or from the same sample at different times. The formula for calculating this is:

t=m1m2s(diff)/nt= \frac{m1 - m2}{s(diff) / \sqrt{n}}

where,

m1 = mean of sample 1

m2 = mean of sample 2

s(diff) = standard deviation of the difference of both samples.

n = size of the sample

Below, we have an example of calculating a paired t-test in R. The data for two samples of a population have been preloaded. One sample includes the occurrence of COVID-19 in people before vaccination and another post-vaccination. We will be using the t.test function built-in R to calculate the t-score of the two values to determine if the vaccine is effective.

#Visualize the data summary for both samples
summary(preVac)
summary(postVac)
# Paired sample t-test
ttest <- t.test(preVac, postVac, paired = TRUE)
# Printing the results
ttest
Results

We can see that the t-value is high and there's a significant difference in the means of both samples. This indicates that there is a big difference in both groups and the occurrence of COVID-19 in post-vaccinated groups is significantly low as compared to the ones not vaccinated.

Code explanation

Line 2 and 3: We use the summary function. This takes in data as its only argument and provides us with some basic statistics of the data.

Line 5: For calculating a one sample t-test, the function t.test takes in three arguments, the data, the given mean, and a boolean variable indicating whether our data is paired or not. It returns the summary of the t-test performed on the values provided.

Independent samples t-test

This is used to compare the means for the two groups. This can further be divided into two different categories:

  1. Equal variance (pooled) test
  2. Unequal variance test

Equal variance is used when the number of data points in both the samples is equal or they have a similar variance. The formula for this is:

t=m1m2(n11)×var12+(n21)×var22n1+n22×1n1+1n2t = \frac{m1-m2}{\frac{(n1-1) \times var1^2 + (n2-1) \times var2^2} {n1+n2-2} \times \sqrt{\frac{1}{n1}+\frac{1}{n2}}}

Unequal variance is used when the number of data points in both the samples is not equal or they have different variances. The formula for this is:

t=m1m2var1n1+var2n2t = \frac{m1-m2}{\sqrt{\frac{var1}{n1}+\frac{var2}{n2}}}

where,

m1 = mean of sample 1

m2 = mean of sample 2

n1 = size of sample 1

n2 = size of sample 2

var1 = variance of sample 1

var2 = variance of sample 2

Let's look at an example of calculating an independent t-test in R with equal and non-equal variance. The data for two samples of a population have been preloaded. The samples include the happiness index of NY and LA. We will be using the t.test function built-in R to calculate the t-score of the two values to determine if they are similar.

#Visualize the data summary
summary(NYHappinessIndex)
summary(LAHappinessIndex)
#Performing independant t-test with equal variance
tteste <- t.test(NYHappinessIndex, LAHappinessIndex, var.equal = TRUE)
ttestne <- t.test(NYHappinessIndex, LAHappinessIndex, var.equal = TRUE)
#Visualize the result
tteste
ttestne
Results

We can observe that the t-value is quite low in both cases since our data is similar. This shows that the happiness index across both states is similar.

Code explanation

Line 2 and 3: We use the summary function. This takes in data as its only argument and provides us with some basic statistics of the data.

Line 5: For calculating a one-sample t-test, the function t.test takes in two arguments, the data and the given mean, and returns the summary of the t-test performed on the values provided.

Line 6: For calculating a one-sample t-test, the function t.test takes in two arguments, the data and the given mean, and returns the summary of the t-test performed on the values provided.

Comparing variances

R provides an easy-to-use function to check and decide which t-test will be required. Below, we can see an implementation where two samples are checked for variance comparison.

#Function to test varianance of two data samples.
var.test(NYHappinessIndex, LAHappinessIndex)
Code explanation

Line 2: We use the var.test function to take in two parameters, that is, data samples needed to be compared, and return the ratio of both samples' variance.

Copyright ©2024 Educative, Inc. All rights reserved