Data Frames
In this lesson, we will introduce you to data frames.
Data frames are an important type of object in R language. This object is particularly useful in various statistical modeling applications. Basically, Data frames are used to store tabular data in R.
Data frames store data as a sequence of columns. Each column can be of a different data type.
Difference between Matrices and Data Frames
Data frames can store different classes of objects in each column. In matrices, all the elements are of the same type, for example, all integers or all numeric.
Let’s have a look at an example. Say you want to store data of an employee. Each employee will have a name (string), address (string), phone number (integer), and gender (character). We can represent the data as follows:
Such data can be represented in the form of a data frame.
Creating Data Frames
It is very simple to create a data frame, just pass vectors of the same length to the data.frame()
function.
myDataFrame <- data.frame(foo = c(10, 20, 30, 40, 50), bar = c(T, F, T, F, T))print(myDataFrame)
We can find the number of rows and columns associated with a data frame using the nrow()
and ncol()
functions.
myDataFrame <- data.frame(foo = c(1, 2, 3, 4, 5), bar = c(T, F, T, F, T))cat("number of rows: ", nrow(myDataFrame), "\n")cat("number of columns: ", ncol(myDataFrame), "\n")
Have a look at the data frame containing employee data:
# Create name, address, phonenumber and gender variablesname <- c("Alex", "Brian", "Charles")address <- c("California", "NewYork", "Boston")phonenumber <- c(2025550167, 2025354137, 2025339164)gender <- c('F', 'M', 'M')employeeDataFrame <- data.frame(name, address, phonenumber, gender)print(employeeDataFrame)
We cannot use cat()
for printing a data frame because, cat()
is used for objects containing only single data types. Here the compiler will throw an error:
# Create name, address, phonenumber and gender variablesname <- c("Alex", "Brian", "Charles")address <- c("California", "NewYork", "Boston")phonenumber <- c(2025550167, 2025354137, 2025339164)gender <- c('F', 'M', 'M')employeeDataFrame <- data.frame(name, address, phonenumber, gender)cat(employeeDataFrame)