...

/

Advanced Data Wrangling

Advanced Data Wrangling

Learn advanced data wrangling techniques.

Spreading and gathering data

We can calculate means or other summary statistics on our dataset by adding row after row to our summarize() function. However, when we’re doing the same calculation on many columns, it may be more beneficial to use the gather() and spread() functions to change the shape of our dataset. These two functions will be very useful!

Gathering

Let’s discuss what gathering our data means. Essentially, we want to take the data in a bunch of columns and put them into two columns (highlighted columns), one containing what used to be the column names and the other having the data that was in the columns. Thus, we end up with one column of categorical data, called the key (the former column headings), and one of the numerical data, called the value (the values in each column).

Let’s see what it looks like to gather our data in a long format. First, we pipe our dataset to the gather() function. Then, we define the names of the key and value columns. For this example, let’s call the key Measurement since the column headings represent different measurements taken on each metamorph. We’ll call the value Value since each numerical value is the value recorded for a particular measurement. Lastly, we’ll select out all the columns that we don’t want to gather. We could just as quickly do this with the select() function. It’s just a question of personal preference. Note that this will create an extremely long data frame, so we can look at the head and tail of the object to get an idea of what it looks like.

R
RxP.clean.long <- RxP.clean %>%
gather(key = Measurement, value = Value,
-Ind, -Block, -Tank, -Tank.Unique, -Hatch, -Pred, -Res)
RxP.clean.long

Now, we can group the object by the Measurement column and calculate the mean values for each measurement with the summarize() function.

R
RxP.clean.long %>%
group_by(Measurement) %>%
summarize(Mean = mean(Value))

Okay, we can see how to calculate the means. We can easily imagine how we ...