...

/

Linear Regression

Linear Regression

Learn about linear regression and how to interpret it.

So far, we’ve discussed scenarios where we have categorical predictors. But what about when we have a continuous predictor? As long as our response variable is normally distributed, it’s a linear regression. As was mentioned previously, linear regression in R is just another form of lm().

For example, with linear regression, we can determine if the size at metamorphosis, SVL.final, is influenced by the length of the larval period, age.DPO. Once again, we’re going to use the log-transformed versions of these variables.

Using SVL as our response variable

We might be wondering why we’re going to use the data on the final SVL as our response variable since we saw in the previous chapter that log transformation didn’t make it normal. It improved the normality, but a Shapiro-Wilks test still said it was significantly unlikely that the data came from a normal distribution. There are two reasons why we use SVL as our response variable:

  1. Biologically speaking, it doesn’t make sense to think that the size of the tadpole at metamorphosis affects the time it took to get to metamorphosis. Instead, we would more likely expect the relationship to go in the other causal direction.
  2. Even if our data isn’t normal, we can always run a model and evaluate the fit using the diagnostic plots. The lm() function is highly robust, and if the model fits well, we’re in good shape. Let’s try it out!
R
lm4<-lm(log.SVL.final~log.Age.DPO, data=RxP.byTank)
summary(lm4)

Let’s run the Anova() function on the lm4 model:

R
Anova(lm4)

Indeed, log.Age.DPO has a significant effect on log.SVL.final. In the Coefficients section of the summary() output, we can also see that the regression has a negative slope, indicating that as the age increases, the SVL decreases. We should look at the diagnostic plots to evaluate how well the model fits:

R
plot(lm4)

R labels observations that may be problematic. ...