Generalized Linear Models
Learn about generalized linear models, also known as GLMs.
Understanding non-normal data
Normal data is data where we have a relatively even spread of values above and below the mean. These values can be any actual number, positive or negative.
Most values are close to the mean, with the probability of obtaining values decreasing as we get farther away from the mean. However, in the real world, data is often not normally distributed. In these cases, we can turn to Generalized Linear Models (GLMs), which extend the modeling framework of lm()
to many other error structures. “error” here refers to the way the data is spread around the mean. The primary function for running a GLM is the glm()
function.
Making non-normal data normal
GLMs work via a link
function, which transforms the data to a normal scale. For example, with a binomial GLM (also called a logistic regression), we aren’t modeling the data (usually a 0 or 1). Instead, we’re modeling the log odds of an event happening (getting a 1) or not happening (getting a 0). In this case, the log odds of the event occurring are normally distributed. Some of the built-in error families available with the glm()
function are as follows:
binomial(link = “logit”)
gaussian(link = “identity”)
poisson(link = “log”)
Gamma(link = “inverse”)
inverse.gaussian(link = “1/mu^2”)
quasi(link = “identity”, variance = “constant”)
quasibinomial(link = “logit”)
quasipoisson(link = “log”)
Next to the previous error families, the link functions are the defaults for each error family. They’re automatically assumed, so we don’t need to write them in our model specification. Several families have multiple possible link functions, some of which may be preferred in certain ...