Normality Test

Get introduced to various normality tests used in R.

The normality of data refers to how close the distribution of the data is to normal distribution. The normal distribution is a bell-shaped curve and holds the following characteristics:

  • The mean, median, and mode are all equal.
  • The distribution is symmetrical around the mean.
  • Particular standard deviation ranges cover a particular percentage of data around the mean.

The normality of datasets can be measured by various methods in R. We can use visualization techniques and numerical tests to make inferences about this measurement. Although just one test is enough to decide the normality of a dataset, it is always better to prove the normality of a dataset using multiple methods.

Shapiro-Wilk test

The Shapiro-Wilk test is a statistical test for the null hypothesis that there is no significant difference between the distribution of the data and the normal distribution. It returns a p-value, which helps us decide if the data is normally distributed or not by comparing it to the significance level. As usual, we reject the null hypothesis if the p-value is smaller than the significance level and conclude that the data is not distributed normally.

The function also returns a w-value. This value indicates how well the standardized sample quantiles fit the standard sample quantiles. It returns a value between 0 and 1:

  • 0 means the worst fit.
  • 1 means the best fit.

We use the shapiro.test() function to apply this test in R. It takes the data and returns the results, making it very practical. Here is how we can apply it:

Get hands-on with 1200+ tech skills courses.