EDA for a Numerical Explanatory Variable
Explore techniques for exploratory data analysis of numerical explanatory variables using R and the Tidyverse. Learn to summarize data with skimr, calculate correlation coefficients, and visualize relationships through scatterplots and regression lines to understand linear associations and data distribution.
We'll cover the following...
Typing out all these summary statistic functions in summarize() would be long and tedious. Instead, let’s use the convenient skim() function from the skimr package. This function takes in a data frame, skims it, and returns the commonly used summary statistics. Let’s take our evals_ch5 data frame, select() only the outcome and explanatory variables teaching score and bty_avg, and pipe them into the skim() function:
For the numerical variables teaching score and bty_avg, it returns:
n_missing: This is the number of missing values.complete_rate: This is the number of non-missing or complete values.mean: This is the average.sd: ...