Fixing Mistakes in the Data
Let's fix our data and prepare for further analyses.
How do we fix mistakes that we found in the data file? One option would be to go back to our original .csv
file or use Microsoft Excel (or whatever program we use to organize and input our data) and change it there. We should certainly do that! But this dataset has over 2,500 individuals, and trying to figure out which individuals are typos could be a very tedious task.
As we have seen earlier that at first, four individuals are recorded as having an SVL
below 19 mm and a Mass
above 0.6 g. Thus, one way to identify where those mistakes are located in the data file is to search for any froglets with a Mass
above 0.6 g and an SVL
below 19. We can do this using square brackets ([]
), called indexing, which allow us to subset the data based on some criterion or criteria.
Data indexing
Using square brackets to navigate a data frame or another type of object is one of the single most important things we can learn to do in R. If we think about a data frame as a two-dimensional object with rows and columns, every item (that is, a cell) in the data frame has a location in terms of its row and column. The [] brackets allow us to navigate the data simply and elegantly. We did a tiny bit of this in the previous chapters of the course, but it’s explained more fully here. Square brackets allow us to find the location of any object in terms of [row, column]. Thus, if we want the object in the fifth row and the third column of our data frame RxP
, we type the following:
RxP[5,3]
We can also use []
to identify a range of things, like the first four rows and the first three columns of the data frame.
RxP[1:4,1:3]
Note: Don’t be confused by the first column. It just carries the serial numbers that are shown while printing the table. It’s considered an index column.
Most importantly, we can enter logical statements within the []
s. For example, if we want to return all of the rows from the Low resource treatment, we can type the code below. Note ...