...

/

GLMs and ANCOVAs

GLMs and ANCOVAs

Learn about the significance of different GLMs and combine them with ANCOVA to perform model selection, reduction, and plots.

Calculating statistical significance with GLMs

Just as with linear models, we can use the Anova() function in the car package to calculate the significant predictors in our model. We can use the anova() function for this, but we highly recommend the capital “A” version of it:

R
Anova(glm.negb)

As we can see, both predators and resources affected the number of tadpoles that survived metamorphosis. The interaction between the predators and resources wasn’t significant, indicating that the resource level on survival is consistent across the different predator treatments. We can see this if we plot the data as follows:

R
qplot(data=RxP.byTank, x=Pred,
y=N.dead,
ylab="# of tadpoles eaten/died before metamorphosis",
geom="boxplot",
fill=Res)

The ramifications of choosing the proper error distribution

It’s worthwhile to take a moment to examine what would be the effect of choosing one of the other error distributions that were illustrated previously. If we look at the output from the Anova() functions, we can see the effect of assuming other error distributions.

Here’s the code for the effect of the normal distribution on the data:

R
Anova(glm.n)

Here’s the code for the effect of the lognormal distribution on the data:

R
Anova(glm.ln)

Here’s the code for the effect of the Poisson distribution on the data:

R
Anova(glm.p)

Here’s the code for the effect of the negative binomial distribution on the data:

R
Anova(glm.negb)

We can see in the previous outputs that choosing the wrong error distributions can impact our view of which variables significantly affect the response variable. If we’d just ignored the error distribution and gone with a standard two-way ANOVA, the glm.n model, we’d have concluded that there wasn’t a significant effect of resources.

On the other hand, had we chosen to go with a Poisson model, the glm.p model, we’d have concluded that resources had an extremely significant effect. This is one of the effects of overdispersion. It can dramatically inflate our significance estimation.

Suppose we hadn’t examined the summary() output from the Poisson model carefully and hadn’t seen that the Residual deviance was so significant. In that case, we might not have realized the model wasn’t appropriate. Based on examining the Q-Q plot and based on our understanding of the shape of the data, we should use the negative binomial or lognormal models, glm.negb or glm.ln, both of which conclude that resources ...