Ready-to-Use Datasets in R

Learn about the details of the dummy datasets in R and how to use them.

Why use dummy datasets?

Analysts often require the ability to experiment with different approaches to identify the most effective solutions for their projects. To facilitate this, we can utilize ready-to-use datasets. These datasets allow us to practice with various data types without investing significant time and effort into data collection.

Additionally, these datasets are beneficial for new learners who are seeking to gain experience using syntax. The R programming language offers a range of preexisting data frames through its libraries, and some of these libraries are specifically designed for this purpose. One of them is the datasets library.

Press + to interact
Preview of the mtcars dataset
Preview of the mtcars dataset

Accessing the dummy datasets

The exercises in this course will primarily utilize dummy datasets from the datasets library. We can easily access the dummy data frames by directly typing their names. The variable names for some datasets are listed below. Take a look at them to build familiarity.

  • iris
  • mtcars
  • PlantGrowth
  • pressure
  • sleep
  • quakes
  • rock
  • attenu
  • cars
  • CO2
  • Indometh
  • mdeaths
  • Orange
  • faithful
  • nottem
  • beaver1

Let’s practice what we’ve learned so far in the following code block. Feel free to play with the code and explore the datasets mentioned above.

Press + to interact
print('mtcars data:')
print(head(mtcars,5)) # first 5 rows of mtcars data.
print('iris data:')
print(head(iris,5)) # first 5 rows of iris data.
# Accidentally assigned another object to the name of a dummy variable
iris <- c(1:3)
print(iris) # Print the new object named iris
remove(iris) # Remove the variable name
print(head(iris,5)) # The original dummy dataset returns

Please avoid assigning other objects to the names of datasets because this may lead to a situation where the variable needs to be removed. In case we need to do this, we can use the remove() command, as shown in line 14 above. Deleting the datasets themselves is impossible since they are a part of the default R programming language. Removing the variable names only resets the data frame.