Search⌘ K

Summarize Variables

Explore how to calculate key summary statistics such as mean, standard deviation, minimum, and maximum using R's dplyr package. Understand how to handle missing values with the na.rm argument and apply various summary functions within summarize() to analyze data efficiently.

We'll cover the following...

Summary statistics are single numerical values that summarize a large number of values. Commonly known examples of summary statistics include the mean (also called the average) and the median (the middle value). Other examples of summary statistics that might not immediately come to mind include the sum, the minimum (the smallest value), the maximum (the largest value) and the standard deviation.

Let’s calculate two summary statistics of temp, a temperature variable in the weather data frame (from the nycflights13 package): the mean and standard deviation. To compute these summary statistics, we need the mean() and sd() summary functions in R. Summary functions in R take in many values and return a single value.

Illustrating a summary function in R
Illustrating a summary function in R

More precisely, we’ll use the mean() and sd() ...