Summarizing and Manipulating Data
Learn about data summarization and manipulation techniques in R.
There are many tools in R to help us summarize our data efficiently. There are two functions worth knowing about at this stage:
- The
aggregate()
function from base R. - The
summarize()
function from the dplyr package.
The aggregate()
function
The aggregate()
function takes a column of raw data and summarizes it across one or more groups based on some chosen function—for example, calculating the mean or the standard deviation. One nice thing about using aggregate()
is that we code the function of how we want our data summarized in the exact same format that we use for specifying plots or models. The output of the aggregate()
function is a data frame, which is then easy to use for plotting figures or other purposes. The aggregate()
function takes three arguments:
- It takes the response and predictor variables.
- It takes the function we want to execute with the
FUN=
argument. - It takes the data frame where the data can be found.
For example, if we want to calculate the mean size of metamorphs at emergence across all combinations of predator and resource treatments, we type the following:
#This is great way to use aggregateaggregate(SVL.initial~Pred*Res, FUN=mean, data=RxP.clean)
The package dplyr has many useful functions for data wrangling. By using some of these in combination, we can easily summarize our data. The downside of dplyr is that, much like ggplot2, it has its own lexicon that is fairly distinct from the rest of R, meaning that we have to learn a whole different set of commands. So be it. It’s pretty great once you learn the coding.
By using a few choice functions, such as group_by()
and summarize()
, we can ...