Describing the Data

Let’s learn how to describe our data in detail.

Describing the data

Nate Silver’s 2012 book on statistics, The Signal and the Noise, sums up the primary goals of data analysis. We want to find out if there is any systematic pattern in our data (signal) that stands out above the background variability (noise). We usually quantify the signal in terms of the average value for each group. Although other measures of central tendency, such as the median or mode, are more appropriate in some cases. Before we calculate the mean for each group, we can start more simply, ignoring pollination type, and do the same thing for all of the height measurements. In R, we can use the mean() function to get the average height of all 30 plants. To do this, we need to tell R which columns to use for the calculation. There are a few different ways to do that. One approach is to use the with() function, which has a data argument to indicate the name of the data frame we want to use:

Get hands-on with 1200+ tech skills courses.