About the Course

Let’s take a look at the goals of this course and its scope.

The aim of this course

This course introduces one of the most useful types of statistical analysis to researchers: linear models and their extensions, such as the generalized-linear-model (GLM).

The goal of this course is to understand the basics of the statistical ideas that are necessary to apply and interpret these models in an effective way.

Why is R important for statistics?

Students and researchers may find it useful to learn about statistical analysis. The approach to statistical analysis is primarily mathematical and makes limited use of equations since they’re already easily found in numerous statistics textbooks as well as online.

In this course, we’ll go through the analysis of real data sets. That means we’ll need a statistical software package for statistics and graphics; in our case, we’ll use the R programming language. Because many scientists only start to take an interest in the subject when they have their own data to analyze, the teaching and learning of introductory statistics can be challenging.

Statistics is very useful for students of degree programs that require a significant amount of research, such as Masters’s and PhDs. Students of these kinds of programs are likely to be more motivated to learn statistics because it’s important knowledge needed for data analysis.

In this course, we’ll use the datasets from the life and environmental sciences. For the sake of convenience, these data sets are available within the R software itself.

The R programming language

R is a principal software for statistics, graphics, and programming. It’s popularly used by scientists, both within academia and the industry. There are several reasons for this, some of which include the following:

  • R is a product of the statistical community and is written by experts.
  • R is free to download and use, which facilitates collaboration.
  • R is multiplatform versions exist for Windows, Mac, and Linux.
  • R is an open-source software that can be easily extended by the R community.
  • R is a statistical software, a graphics package, and a programming language all in one. It can also be used to make books, blogs, and websites.

We’ll focus on the linear model, because it’s one of the single most useful parts of statistics. This course starts with an introduction to several different variations of the basic linear-model analysis:

  • Analysis of variance
  • Linear regression
  • Analysis of covariance

We’ll then introduce an extension that uses generalized linear models for data with non-normal distributions. The advantage of following the linear-model approach is that a wide range of different types of data and experimental designs can be analyzed with very similar approaches. In particular, all of the analyses covered in this course can be performed in R using only two main functions. The first one is the lm() function for linear models. The second one is the glm() function for generalized linear models (GLMs). These along with a set of base R functions, can extract different aspects of our results.

This course primarily covers statistics, not the R software itself. Since statistics is a vast subject, we can’t cover all areas, so coverage is primarily limited to linear models and generalized linear models. More specifically, this course doesn’t cover the following concepts:

  • Non-linear regression approach
  • Generalized additive models (GAMs).
  • Non-parametric statistics

Experimental design is covered briefly and integrated into the relevant sections. The use of information criteria and multimodel inference are also briefly introduced.