Regression Refresher
Explore the fundamentals of simple linear regression with R and Tidyverse. Learn to fit regression models, interpret slope and intercept values, and understand sampling variability through a teaching evaluations dataset. This lesson builds your ability to analyze and infer relationships between variables in a data-centric context.
We'll cover the following...
Needed packages
Let’s load all the packages needed for this chapter. Loading the tidyverse package by running library(tidyverse) loads the following commonly used data science packages all at once:
ggplot2: This is for data visualization.dplyr: This is for data wrangling.tidyr: This is for converting data to the tidy format.readr: This is for importing spreadsheet data into R.purrr,tibble,stringr, andforcats: These are the more advanced packages.
Before jumping into inference for regression, let’s remind ourselves of the University of Texas Austin teaching evaluations analysis.
Teaching evaluations analysis
Using simple linear regression, we modeled the relationship between:
A numerical outcome variable y (the instructor’s teaching score)
A single numerical explanatory variable x (the instructor’s beauty score)
We first created an evals_ch5 data frame that selected a subset of variables from the evals data frame included in the moderndive package. This evals_ch5 data frame contains only the variables of interest for our analysis, in particular the instructor’s teaching score and the beauty rating bty_avg:
We performed an exploratory data analysis of the relationship between the two variables score and bty_avg. We saw there that a weakly positive ...