Save Duplicate Observations Into a Different Dataset

Learn how to save duplicate observations into a different dataset.

How to save duplicate observations into a different dataset

Earlier we learned how to remove duplicate observations. Often it is necessary for us to know why duplicate observations exist and whether the observations that are duplicates according to the sorting variables also have duplicate values for other variables in the dataset. Hence, we often would like to send the duplicate observations to a separate dataset for examination. The following R code shows how to do that in two different ways.

 # create a dataset of duplicated observations
pwt7.d <- pwt7[duplicated(pwt7[, c("isocode", "year")]), ]

An alternative way for inspecting duplicate observations is to assign a logical value TRUE or FALSE to each observation in the original dataset with TRUE indicating an observation has duplicated values for sorting variables, and assign the output to a new dataset. Then we can apply the View() function to directly view which observations are duplicates, and apply the table() function to get a frequency count of the number of duplicate observations in the dataset. The R code is listed below.

Get hands-on with 1200+ tech skills courses.