What is the unite() function in R programming?
Tidy data
There are many ways to arrange data, and some make it easy to analyze data, which is where tidy data comes in. The concept of tidy data is explained in Hadley Wickham’s 2014 paper, Tidy Data.
In tidy data, each observation is mapped to a useful shape or structure. In the structure of tidy data:
- Every column represents a variable
- Every row entry of data is an observation
- Each cell is a single value
The graphics below demonstrate the multiple fields of tidy data frames.
The unite() method
The unite() method is used to merge two or more columns into a single column or variable. unite() generates a single data frame as output after merging the specified columns.
Syntax
unite(data, col, ..., sep = ",", remove = TRUE)
Parameters
data: Table or data frame of interestcol: Name of a new column that is to be added...: Names of columns that are to be unitedsep: How to join the data in the columnsremove: Removes input columns from the output data frame; default =TRUE
Return value
unite() returns a copy of the data frame with new columns.
Code
# Initializing Matrix with valuesMatrix <- matrix(c('2000m2','NY','New York','$20000','3500m2','Chi','Chicago','$24000','1300m2','Bos','Boston','$90888' ,'1600m2','Was','Washington','$90013'), ncol=4, byrow=TRUE)colnames(Matrix) <- c('House_Area','Location','City','Price')rownames(Matrix) <- c('1','2','3','4')# Converting matrix to table# using table() methodHousing_dataset <- as.table(Matrix)# Show data in table format named Housing_datasetprint(Housing_dataset)# Calling unite() method for mergingHousing_dataset_updated = unite(Housing_dataset,col='Address', c('City', 'P.Code') , sep = " ", remove = TRUE)print(Housing_dataset_updated)
Expected output
Explanation
As highlighted, the unite() method takes the House_dataset with the new col='Adress' variable and the two merging columns c('City', 'P.Code') as arguments. unite() generates the output Housing_dataset_updated like the image above, with two columns (City and P.Code) as a single variable, Address.