Get a summary of the essential points that were covered in this chapter.

Let’s summarize what we have learned so far in this section.

  • Error distribution: We saw that data from a normal or Gaussian error distribution has a relatively even spread of values above and below the mean. This data can be made up of any real number(s), positive or negative.

  • Normality tests: We learned that data can be tested for normality with a normality test, such as the Shapiro-Wilk Test. Still, it’s often better to use a function like fitdistr() to estimate which predefined error distribution the data is most similar to.

  • Log-transformation: We saw that log-transformation is a common technique to help normalize many biological data.

  • Non-parametric tests are not enough: We discussed that many old non-parametric statistical tests can analyze non-normally distributed data, but these have fallen out of favor with the rise of generalized linear models.

  • T-tests: We learned that Student’s t-test is designed for analyzing the difference in means of normally distributed data in two categories. A t-test is designed for small sample sizes, and large sample sizes are equivalent to a linear model.

  • Linear models: We also discussed that linear models are one of the most commonly used forms of statistics available.

    Linear models allow us to analyze a normally distributed response variable with any combination of categorical or continuous predictor variables.

  • Useful functions: We also saw many useful functions for all linear models, such as summary(), Anova() from the car package, and plot(). Post-hoc tests can be accomplished using the emmeans package.

Get hands-on with 1200+ tech skills courses.