...

/

Introducing Linear Models

Introducing Linear Models

Get familiarized with linear models and diagnostic plots.

Functions for linear models

The model commonly referred to as a linear model, or lm(), is one of the most flexible and valuable models in all statistics. This flexibility leads the model to also be known as the general linear model to some. This is a bit confusing with the generalized linear model, so we will just use R’s preferred lingo, the linear model. The constraint on a linear model is that the response variable must be normally distributed, but the predictor variable(s) can be continuous or discrete—that is, categorical.

Take a look at the table in the Basic Statistical Analysis lesson for a rundown of the lay terms commonly applied to linear models with various combinations of predictor variables. The following functions are useful for all linear models.

Useful functions for linear models

Function Name

Description

summary()

This function summarizes our model and gives us essential information, such as the adjusted R-squared and treatment means, and the F-statistic and p-value for the model. For linear regression and ANCOVA, this function also provides the slope and intercept for the line of best fit.

anova()

This function provides a very brief summary of the overall model but doesn’t provide information on individual levels within factors. The statistical significance of each predictor is calculated in a stepwise manner, adding each factor one by one.

Anova()

This function provides summary information similar to the information generated by the anova() function, but here, the statistical significance of each predictor is calculated assuming all other factors are in the model. This function is found within the car package and is identical to anova() for most models. However, we personally like Anova() better than anova() for calculating summary statistics because we know the information provided by the former is always correct and more conservative.

plot()

This function provides diagnostic plots of the residuals of the model, which are useful for assessing the model’s fit and balance.

One-way Analysis of Variance—ANOVA

One of the most common statistical analyses is the Analysis of Variance, which is also called ANOVA for short. In its simplest form, an ANOVA is a statistical test of whether or not the means of multiple (more than two) groups are equal. Thus, it’s essentially the same as the t-test but for more than two groups. In fact, we can use this type of analysis to do the same hypothesis test as the t-test. The resulting p-value and test statistic give us some idea of the probability that the means of those groups are different.

For example, perhaps we want to know if log.Age.FromEmergence differs across our three predator treatments—Control, Nonlethal, and Lethal. We’ll make our model an R object and then examine it using the summary() function. Note that we could have done this in the t-test example, but we chose not to. Also, remember that we’re using the log-transformed version of our variable so that it’s normally distributed.

R
lm1<-lm(log.Age.FromEmergence~Pred, data=RxP.byTank)
summary(lm1)

So, what does this mean? We created an object called lm1 and then used the summary() function to look at it. The top of the summary output tells us the model we made and gives us information about the distribution of the residuals around the mean. Residuals are how different each of the actual data points is from the predicted fit from the model. Thus, if our data fit the model well, we’d expect roughly an even spread of residuals above and below the median, which should be close to zero. The section entitled Coefficients provides information about the estimated mean for each level (C, NL, and L) within the factor (Pred). The way R reports estimates of means is that a baseline is established alphabetically, and everything else is concerning that baseline. Thus, in the output for lm1, what’s labeled as (Intercept) is the mean for the Control treatment ...