The summarize()
function is used in the R program to summarize the DataFrame into just one value or vector.
This summarization is done through grouping observations by using categorical values at first, using the groupby()
function.
The dplyr
package is used to get the summary of the dataset. The summarize()
function offers the summary that is based on the action done on grouped or ungrouped data.
The operations that can be performed on grouped data are average
, factor
, count
, mean
, etc.
# Load librarylibrary(dplyr)data <- PlantGrowth# summarizesummarize(data, mean(weight,na.rm=TRUE))
In the example above, we use the summarize()
function to obtain the mean weight of all the plant species in the PlantGrowth
dataset.
We can also summarize ungrouped data. This can be done by using three functions.
summarize_all()
summarize_at()
summazrize_if()
summarize_all()
methodThis function summarizes all the columns of data based on the action which is to be performed.
summarize_all(action)
action
: The function to apply on DataFrame columns. It can be either lambda or use funs()
.
In the code snippet below, we load the data
variable. In the variable sample
, we are loading the top six observations to process. The sample %>% summarize_all(mean)
will show the mean of the six observations in the result.
# Load dplyr librarylibrary(dplyr, warn.conflicts = FALSE)# Main codedata <- mtcars# Loading starting 6 observationssample <- head(data)# Caculating mean value.sample %>% summarize_all(mean)
summarize_at()
methodIt performs the action on the specific column and generates the summary based on that action.
summarize_at(vector_of_columns, action)
vector_of_columns
: The list of column names or character vector of column names.action
: The function to apply on DataFrame columns. It can be either lambda or use funs()
.In the code snippet below, we load the data
variable. In the variable sample
, we are loading the top six observations to process. The sample %>% group_by(hp) %>% summarize_at(c('cyl','mpg'),mean)
will show the mean of the 'cyl'
and 'mpg'
observations in the result, grouping with hp
(dataset feature/column name).
# Load dplyr librarylibrary(dplyr, warn.conflicts = FALSE)# Main codedata<-mtcarssample <- head(data)sample %>% group_by(hp) %>%summarize_at(c('cyl','mpg'),mean)
summarize_if()
methodIn this function, we specify a condition and the summary will be generated if the condition is satisfied.
summarize_if(.predicate, .action)
predicate
: A predicate function to apply to logical values or DataFrame columns.action
: The function to apply on DataFrame columns. It can be either lambda or use funs()
.A predicate function in R returns only True/False.
In the code snippet below, we use the predicate
function is.numeric
and mean
as an action.
# Laod dplyr librarrylibrary(dplyr, warn.conflicts = FALSE)# Main codedata<-mtcarsz<- head(data)z %>% group_by(hp) %>%summarize_if(is.numeric, mean)
RELATED TAGS
CONTRIBUTOR