Generalized Linear Models

Learn about generalized linear models, also known as GLMs.

Most values are close to the mean, with the probability of obtaining values decreasing as we get farther away from the mean. However, in the real world, data is often not normally distributed. In these cases, we can turn to Generalized Linear Models (GLMs), which extend the modeling framework of lm() to many other error structures. “error” here refers to the way the data is spread around the mean. The primary function for running a GLM is the glm() function.

Making non-normal data normal

GLMs work via a link function, which transforms the data to a normal scale. For example, with a binomial GLM (also called a logistic regression), we aren’t modeling the data (usually a 0 or 1). Instead, we’re modeling the log odds of an event happening (getting a 1) or not happening (getting a 0). In this case, the log odds of the event occurring are normally distributed. Some of the built-in error families available with the glm() function are as follows:

binomial(link = “logit”)
gaussian(link = “identity”)
poisson(link = “log”)
Gamma(link = “inverse”)
inverse.gaussian(link = “1/mu^2”)
quasi(link = “identity”, variance = “constant”)
quasibinomial(link = “logit”)
quasipoisson(link = “log”)

Next to the previous error families, the link functions are the defaults for each error family. They’re automatically assumed, so we don’t need to write them in our model specification. Several families have multiple possible link functions, some of which may be preferred in certain ...

Introduction to R

Thoughts on Proper Data Analysis

Exploratory Data Analysis and Data Summarization

Introduction to Plotting

Basic Statistical Analysis Using R

More Linear Models in R

Advanced Statistical Analysis Using R

Mixed-effects Model

Advanced Data Wrangling and Plotting

Writing Loops and Functions in R

Appendix

Conclusion

Generalized Linear Models

Understanding non-normal data

Making non-normal data normal