How to import data from Excel to R using tidyxl
In R, there’s a powerful package called tidyxl. It’s useful for working with Excel files and provides a seamless approach to importing data into R for analysis. In this Answer, we’ll uncover the process of importing data from Excel using the tidyxl package.
Install the tidyxl package
First, we must install the required tidyxl package in our R environment. We can install it using the following command:
install.packages("tidyxl")
Load the tidyxl package
Once we’ve installed the required tidyxl package, we’ve to load it into our R session by running the following command:
library(tidyxl)
Import data from an Excel file
Now that we’ve loaded the tidyxl package in our R session, we can import data from the given Excel file—from an entire Excel workbook or a specific worksheet.
Import data from an entire Excel workbook
We can import an entire Excel workbook using the xlsx_cells() function. The basic syntax is as follows:
data <- xlsx_cells("path/to/excel_file.xlsx")
Note: Replace
"path/to/excel_file.xlsx"with the actual file path of your Excel workbook.
Import data from a specific Excel worksheet
If our Excel workbook contains multiple worksheets and we want to import a specific one, we can specify the sheet name using the sheet argument. For example:
data <- xlsx_cells("path/to/excel_file.xlsx", sheet="Sheet1")
Note: Replace
"Sheet1"with the worksheet name according to the needs of your import.
Once we’ve successfully imported the data from Excel, it’ll be stored in a data frame in R with information about each cell, including values, formulas, and cell formatting. We can perform various operations just like any other data frame. For instance, to take a quick look at the imported data, we can use the head() function to view the first few rows as follows:
head(data)
Let’s look at an example of how to import data from an Excel file using the tidyxl package.
Example
Here, we have an Excel file named duncan_data.xlsx containing data on 45 occupations in the USA. We want to load this data into an R data frame and extract values from a specific sheet in a particular column.
# Loading the tidyxl package library(tidyxl) # Specifying the path to the Excel file excel_file <- "/educative/duncan_data.xlsx" # Reading data from an Excel workbook into a data frame duncan_data <- xlsx_cells(excel_file) # Specifying the sheet in the Excel file sheet_name <- "Sheet1" # Extracting the values from first column column_values <- duncan_data[duncan_data$sheet == sheet_name & duncan_data$col == 1, ] # Viewing the extracted data head(column_values)
Explanation
In the example:
Line 2: We’ll load the
tidyxlpackage, making its functions and capabilities available in our R environment.Line 5: We’ll specify the file path to the Excel file (i.e.,
"/educative/duncan_data.xlsx") we want to work with.Line 8: Read the data from the Excel workbook specified in
excel_fileand store it in theduncan_datadata frame. Thexlsx_cells()function extracts cell-level information from the Excel file, such as cell values, formulas, formatting, etc.Line 11: Specify the worksheet’s name within the Excel file we want to work with. In this case, we’re setting
sheet_nameto"Sheet1".Line 14: Extract values from
duncan_databased on two conditions:duncan_data$sheet == sheet_namechecks if the sheet name matches"Sheet1"duncan_data$col == 1checks if the column number is1
Line 17: Quickly inspect the data.
Note: The Duncan's occupational prestige data used in the given example is borrowed from the R’s
carDatapackage.
Conclusion
The tidyxl package in R provides a straightforward and efficient way to import data from Excel workbooks and sheets. We can access and work with our data in R with simple commands, making it a valuable tool in any data science project. Hence, it simplifies integrating Excel data into our R workflows.
Free Resources