...

/

Basic Data Wrangling

Basic Data Wrangling

Learn some fundamental data wrangling functions that are used to summarize data, and also learn about the different types of joins in R.

We’ll begin exploring how to summarize our data in various ways, including calculating new variables as necessary. We’ve done something like this already, so hopefully, this serves as a refresher, reinforcement, and expansion of material that was introduced in earlier chapters.

Calculating treatment means is one of the most common things a scientist may need to do. This is useful for plotting and also for finding the average values across different categories of data. What’s the average effect of our experimental treatments? How tall are the plants in each species we’ve collected? What’s the level of expression for each of the genes in our RNA-seq dataset? These are the sorts of things we can answer once we summarize our data in some way.

Let’s start by calculating the mean age at emergence, in terms of days post-oviposition, for each of our predation treatments. This code has three steps:

  1. We begin by declaring our original data frame, RxP.clean.
  2. We set our grouping variable, Pred.
  3. We define the new variable, Mean.Age.DPO, to calculate the mean using mean(Age.DPO).

Note that we pipe one line to the next using the %>% function at each step of the process.

Group and summarize data

R
#Here is an example of how you can summarize a variable
RxP.clean %>% #Define the starting data frame
group_by(Pred) %>% #Define how to group the data
summarize(Mean.Age.DPO = mean(Age.DPO)) #Calculate the mean

Okay, let’s take a moment to notice a few things:

  • First, since we’ve designated the data frame at the outset, we can just refer to individual columns within that data frame without using the $ operator.

  • Secondly, here we didn’t need to use the select() or the filter() function on our data frame. The ...