Trusted answers to developer questions

What is the summarize() method in R?

Get Started With Data Science

Learn the fundamentals of Data Science with this free course. Future-proof your career by adding Data Science skills to your toolkit — or prepare to land a job in AI, Machine Learning, or Data Analysis.

The summarize() function is used in the R program to summarize the DataFrame into just one value or vector. This summarization is done through grouping observations by using categorical values at first, using the groupby() function.

The dplyr package is used to get the summary of the dataset. The summarize() function offers the summary that is based on the action done on grouped or ungrouped data.

Summarize grouped data

The operations that can be performed on grouped data are average, factor, count, mean, etc.

# Load library
library(dplyr)
data <- PlantGrowth
# summarize
summarize(data, mean(weight,na.rm=TRUE))

In the example above, we use the summarize() function to obtain the mean weight of all the plant species in the PlantGrowth dataset.

Summarize ungrouped data

We can also summarize ungrouped data. This can be done by using three functions.

  • summarize_all()
  • summarize_at()
  • summazrize_if()

1. The summarize_all() method

This function summarizes all the columns of data based on the action which is to be performed.

Syntax


summarize_all(action)

Parameters

action: The function to apply on DataFrame columns. It can be either lambda or use funs().

Code

In the code snippet below, we load the mtcarsMotor Trend US magazine dataset dataset in the data variable. In the variable sample, we are loading the top six observations to process. The sample %>% summarize_all(mean) will show the mean of the six observations in the result.

# Load dplyr library
library(dplyr, warn.conflicts = FALSE)
# Main code
data <- mtcars
# Loading starting 6 observations
sample <- head(data)
# Caculating mean value.
sample %>% summarize_all(mean)

2. The summarize_at() method

It performs the action on the specific column and generates the summary based on that action.

Syntax


summarize_at(vector_of_columns, action)

Parameters

  • vector_of_columns: The list of column names or character vector of column names.
  • action: The function to apply on DataFrame columns. It can be either lambda or use funs().

Code

In the code snippet below, we load the mtcarsMotor Trend US magazine dataset dataset in the data variable. In the variable sample, we are loading the top six observations to process. The sample %>% group_by(hp) %>% summarize_at(c('cyl','mpg'),mean) will show the mean of the 'cyl' and 'mpg' observations in the result, grouping with hp (dataset feature/column name).

# Load dplyr library
library(dplyr, warn.conflicts = FALSE)
# Main code
data<-mtcars
sample <- head(data)
sample %>% group_by(hp) %>%
summarize_at(c('cyl','mpg'),mean)

3. The summarize_if() method

In this function, we specify a condition and the summary will be generated if the condition is satisfied.

Syntax


summarize_if(.predicate, .action)

Parameters

  • predicate: A predicate function to apply to logical values or DataFrame columns.
  • action: The function to apply on DataFrame columns. It can be either lambda or use funs().

A predicate function in R returns only True/False.

Code

In the code snippet below, we use the predicate function is.numeric and mean as an action.

# Laod dplyr librarry
library(dplyr, warn.conflicts = FALSE)
# Main code
data<-mtcars
z<- head(data)
z %>% group_by(hp) %>%
summarize_if(is.numeric, mean)

RELATED TAGS

r programming
Did you find this helpful?