Normality Test
Explore the concept of data normality and understand how to assess it using multiple statistical tests and visualization methods in R. Learn to apply the Shapiro-Wilk and Kolmogorov-Smirnov tests, interpret skewness values, and use Q-Q and density plots to determine if your data follows a normal distribution.
We'll cover the following...
The normality of data refers to how close the distribution of the data is to normal distribution. The normal distribution is a bell-shaped curve and holds the following characteristics:
- The mean, median, and mode are all equal.
- The distribution is symmetrical around the mean.
- Particular standard deviation ranges cover a particular percentage of data around the mean.
The normality of datasets can be measured by various methods in R. We can use visualization techniques and numerical tests to make inferences about this measurement. Although just one test is enough to decide the normality of a dataset, it is always better to prove the normality of a dataset using multiple methods.
Shapiro-Wilk test
The Shapiro-Wilk test is a statistical test for the null hypothesis that there is no significant difference between the distribution of the data and the normal distribution. It returns a p-value, which helps us decide if the data is normally distributed or not by comparing it to the significance level. As usual, we reject the null hypothesis if the p-value is smaller ...