In R, there’s a powerful package called tidyxl
. It’s useful for working with Excel files and provides a seamless approach to importing data into R for analysis. In this Answer, we’ll uncover the process of importing data from Excel using the tidyxl
package.
tidyxl
packageFirst, we must install the required tidyxl
package in our R environment. We can install it using the following command:
install.packages("tidyxl")
tidyxl
packageOnce we’ve installed the required tidyxl
package, we’ve to load it into our R session by running the following command:
library(tidyxl)
Now that we’ve loaded the tidyxl
package in our R session, we can import data from the given Excel file—from an entire Excel workbook or a specific worksheet.
We can import an entire Excel workbook using the xlsx_cells()
function. The basic syntax is as follows:
data <- xlsx_cells("path/to/excel_file.xlsx")
Note: Replace
"path/to/excel_file.xlsx"
with the actual file path of your Excel workbook.
If our Excel workbook contains multiple worksheets and we want to import a specific one, we can specify the sheet name using the sheet
argument. For example:
data <- xlsx_cells("path/to/excel_file.xlsx", sheet="Sheet1")
Note: Replace
"Sheet1"
with the worksheet name according to the needs of your import.
Once we’ve successfully imported the data from Excel, it’ll be stored in a data frame in R with information about each cell, including values, formulas, and cell formatting. We can perform various operations just like any other data frame. For instance, to take a quick look at the imported data, we can use the head()
function to view the first few rows as follows:
head(data)
Let’s look at an example of how to import data from an Excel file using the tidyxl
package.
Here, we have an Excel file named duncan_data.xlsx
containing data on 45 occupations in the USA. We want to load this data into an R data frame and extract values from a specific sheet in a particular column.
# Loading the tidyxl package library(tidyxl) # Specifying the path to the Excel file excel_file <- "/educative/duncan_data.xlsx" # Reading data from an Excel workbook into a data frame duncan_data <- xlsx_cells(excel_file) # Specifying the sheet in the Excel file sheet_name <- "Sheet1" # Extracting the values from first column column_values <- duncan_data[duncan_data$sheet == sheet_name & duncan_data$col == 1, ] # Viewing the extracted data head(column_values)
In the example:
Line 2: We’ll load the tidyxl
package, making its functions and capabilities available in our R environment.
Line 5: We’ll specify the file path to the Excel file (i.e., "/educative/duncan_data.xlsx"
) we want to work with.
Line 8: Read the data from the Excel workbook specified in excel_file
and store it in the duncan_data
data frame. The xlsx_cells()
function extracts cell-level information from the Excel file, such as cell values, formulas, formatting, etc.
Line 11: Specify the worksheet’s name within the Excel file we want to work with. In this case, we’re setting sheet_name
to "Sheet1"
.
Line 14: Extract values from duncan_data
based on two conditions:
duncan_data$sheet == sheet_name
checks if the sheet name matches "Sheet1"
duncan_data$col == 1
checks if the column number is 1
Line 17: Quickly inspect the data.
Note: The Duncan's occupational prestige data used in the given example is borrowed from the R’s
carData
package.
The tidyxl
package in R provides a straightforward and efficient way to import data from Excel workbooks and sheets. We can access and work with our data in R with simple commands, making it a valuable tool in any data science project. Hence, it simplifies integrating Excel data into our R workflows.
Free Resources