Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

r programming
communitycreator

What is tibble versus data frame in R?

AKASH BAJWA

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Tibble is a package in the R programming language that is used to manipulate and print data frames. It is the latest method for reimagining a data frame. It keeps all the crucial features regarding the data frame.

Key features of Tibble

  • A Tibble never alters the input type.
  • With Tibble, there is no need for us to be bothered about the automatic changing of characters to strings.
  • Tibbles can also contain columns that are the lists.
  • We can also use non-standard variable names in Tibble.
  • We can start the name of a Tibble with a number, or we can also contain space.
  • To utilize these names, we must mention them in backticks.
  • Tibble only recycles the vectors with a length of 1.
  • Tibble can never generate the names of rows.

Tibbles versus data frame

There are two main differences between Tibbles and the data frame:

  1. Printing
  2. Subsetting

Printing

Tibble has a more advanced print function. It shows only the first ten rows with all the columns that can be fit on the screen. Each column also shows the data types. It helps to avoid printing too much data automatically.

Subsetting

We can use the indexing for Tibbles in multiple ways:

df$y
df[["y"]]
df[[1]]

The pipe can also be used for submitting:

df %>% .$y
df %>% .[["y"]]

In the code snippet below, we have a data frame as a record of an employee:

# Creating the data frame.
employee <- data.frame(
employee_id = c (1:5),
employee_name = c("JOHN","TIM","STARC","HARRY","TINA"),
employee_salary = c(567.3,675.2,674.0,678.0,790.25),
# employee starting date
starting_date = as.Date(c("2015-03-03", "2016-08-02", "2018-11-12", "2021-07-21",
"2016-01-06")),
stringsAsFactors = FALSE
)
# Printing the data frame.
print(employee)
Demo code

Code explanation

  • Line 2: We create a dataframe, using the data.frame() method.
  • Lines 3–7: The fields of the employee data frame include employee_id, employee_name, employee_salary, and starting_date.
  • Line 12: We print the employee data frame to the console.

In the code snippet below, we have a Tibble from tidyverse package:

# Program for tibble implementation
library("dplyr", warn.conflicts = FALSE)
Section<-rep(c("A","B","C","D","E"),times=10)
Ranking<-sample(1:10,50,replace=TRUE)
dataframe<-data.frame(Section,Ranking)
# first 5 entries of tibble
head(dataframe,5)
# 5 entries from tail of tibble
tail(dataframe,5)
mylist <- dataframe %>% dplyr::group_by(Section,Ranking) %>% dplyr::mutate(count=n())
print(mylist)

Code explanation

  • Line 2: We import the dplyr library.
  • Line 3: We create a vector after the Section name. The rep() function will replicate vector or list to N times, that is, times=10.
  • Line 4: We create a vector after the Ranking name. The sample() function will return 50 samples with values from 1 to 10, that is, 1:10.
  • Line 5: We generate a dataframe, using the vectors we created above (Section, Ranking).
  • Line 7: We print the first five entries of the data frame.
  • Line 9: We print the last five entries of the data frame.
  • Line 10: We create a list by grouping the data frame we created above and mutating the count that contains the number of observations, using the n() function.

RELATED TAGS

r programming
communitycreator

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring