About This Course

The traditional way of statistical analysis

The statistical analysis that life scientists are expected to perform is increasingly advanced. Yet, most graduate programs don’t even offer a statistics course that teaches beyond the analysis of variance (ANOVA) and linear regression. Thus, undergraduate and graduate students are rarely given the opportunity to learn the types of analysis they need in order to publish and compete in the job market, let alone analyze their data appropriately. Part of the reason for this is that the way statistics are traditionally taught can be frustratingly slow and tedious.

This course uses the statistical language R, which is the choice of ecologists worldwide and is rapidly becoming the “go-to” stats program throughout the life sciences. The examples in the course are rooted in a single, real dataset (published in the journal Ecology in 2013) and use actual analyses that the author has conducted in his professional career as an ecologist. The dataset is admittedly somewhat messy, and early chapters are designed so that students “clean” the raw data as a way of learning basic data manipulation skills and building good habits. Moreover, using a single relatively large dataset (~2500 observations) allows students to get a good understanding of what they are analyzing from chapter to chapter, instead of jumping from one small pre-cleaned dataset to another throughout the course. It also allows readers to see how they can view the same data through different lenses and allows an easy and natural progression from linear and generalized linear models to mixed effects versions of those same analyses, given the hierarchically nested design of the example experiment.

Goals for the course

This course is written to show that a comprehensive understanding of experimental data and analysis isn’t as daunting a task as it may seem. Instead of spending time mired in statistical theory and learning data analysis by hand, the most important thing to understand is what kind of data we have. Once we know what kind of data we’re dealing with, we can figure out how to analyze it effectively. This course will provide the tools needed to properly diagnose our data in an efficient, accessible, and plainspoken manner. This ensures that readers come away with the knowledge of which analysis they should use and when they should use it.

There’s a quote that we love from the musician, actor, author, and poet, Henry Rollins, which encapsulates a lot of how I think about doing statistical analyses and using R:

Numbers are perfect, infallible, and everlasting. You aren’t. Numbers are always right in the end. You may see an incorrect figure, but that’s not the fault of the number; the responsibility lies in the person doing the calculation.

—Henry Rollins, High Adventure in the Great Outdoors

Why do we like that quote so much? It’s because when we get an error in R, it’s almost certainly our fault. R didn’t mess up; we did. That’s just the truth. Always check the code to avoid errors!

At this point, we may be thinking to ourselves, “Why do I need to learn R?” or “Seriously, I have to type everything in by hand?” or “Can’t I do this more easily in another program?” There are many answers to these questions:

If you’re undergraduate thinking of going to graduate school, you’ll almost certainly use R as a graduate student. With knowledge of R, you’ll have a leg up on all of your peers!
Yes, we have to type everything in, but that also helps us learn and memorize what we’re doing. It’s easy to click some buttons and get an answer that we don’t understand. If we have to type in the code for the statistics we’re analyzing, we’ll better understand what we’re doing.
Having some familiarity with coding is increasingly helpful across various disciplines. We don’t need to be pros at coding, but being comfortable with a computer and typing code to achieve a result is a useful skill to have.
Since it’s freely accessible and compelling, R is the only statistics program we’ll ever need to know. If we go on to graduate school or into consulting or into any field that deals with data, we’ll be able to use R. This course will teach many of the basics of R that we’ll need to know, but one of the best things about R is that it can be expanded to accomplish nearly any statistical (or, more generally, data analytic) needs we may have. The same can’t be said about other programs like JMP, SPSS, or SAS, which are very expensive and may not be available to us at another institution.

Course Introduction

Introduction to R

Thoughts on Proper Data Analysis

Exploratory Data Analysis and Data Summarization

Introduction to Plotting

Basic Statistical Analysis Using R

More Linear Models in R

Advanced Statistical Analysis Using R

Mixed-effects Model

Advanced Data Wrangling and Plotting

Writing Loops and Functions in R

Appendix

Conclusion

The traditional way of statistical analysis

Goals for the course