How to read Excel files in R
Overview
For simple data analysis, the first important step after
Therefore, we will discuss two techniques to read an Excel file in R.
Technique 1: Use read_excel() from readx1
The read_excel() function is used to read an Excel file, and it is only accessible after you import the readxl library in the R program.
Syntax
read_excel(
path,
sheet = 1,
col_names = TRUE,
col_types = NULL,
na = "",
skip = 0
)
Parameters
-
path: The directory of the Excel file with extension. -
sheet:Default= 1. If we have more than 1 sheet in a single workbook, we can assign sheet #, i.e., 1, 2, 3, 4… -
col_names:Default= TRUE.TRUE: Read the first row as a file header.FALSE: The file does not have a header.Character Vector: Character vector that contains header names.
-
col_types:Default= NULL.NULL: Interpret the type of column from the spreadsheet, i.e., Date, Currency, etc.Vector: A vector that contains"blank","numeric","date", or"text".
-
na:Default= "". Keeps empty cells as is, but we can specifynato fill empty cells. -
skip:Default= 0. Number of rows to skip from start before reading a file.
Return value
This method returns Excel file data as a data frame.
Code
Example 1
In this example, we read an Excel file, Employee.xlsx. Below, we have the expected output of this code.
# installing librarylibrary(readxl) # Optional Stepdata <- read_excel ("Employee.xlsx")print(data)
Technique 2: Use read.xlsx() from xlsx
The read.xlsx() function is imported from the xlsx library, which is used to read an Excel file in R language.
Syntax
read.xlsx(
path,
sheet,
startRow = 1,
colNames = TRUE,
rowNames = FALSE,
detectDates = FALSE,
skipEmptyRows = TRUE,
skipEmptyCols = TRUE,
rows = NULL,
cols = NULL,
check.names = FALSE,
sep.names = ".",
namedRegion = NULL,
na.strings = "NA",
fillMergedCells = FALSE
)
Parameters
-
path: Directory of file including extension. -
sheet: Index or name of the sheet to read data. -
startRow*:Default= 1`. The first row to begin and check for data. -
colNames:Default= TRUE. IfTrue, the first row will be selected as column names. -
rowNames:Default= FALSE. IfTrue, the first column will be selected as row names. -
detectDates:Default= FALSE. IfTrue, this method will try to recognize Dates and perform conversions if needed. -
skipEmptyRows:Default= TRUE. IfTrue, empty rows will be skipped during loading. After the first row containing data, other empty rows will be returned asNAs. -
skipEmptyCols:Default= TRUE. IfTrue, empty columns will be skipped during loading. -
rows:Default= NULL.-
NULL: Ifrowsare set toNULL, then all rows are read. -
Vector: Ifrowsare set to a vector of numeric values, then specified rows will be read.
-
-
cols:Default= NULL.-
NULL: Ifcolsis set toNULL, then all columns are read. -
Vector: Ifcolsis set tovectorof numeric values, then specified columns will be read.
-
-
check.names:Default= FALSE. IfTrue, then it will check if each variable is syntactically valid or not. -
sep.names:Default= ".". Separates each variable in column names. -
namedRegion:Default= NULL.-
NULL: Values ofrows,cols, andstartRowparameters will be considered within a workbook. -
Not NULL: Values ofrows,cols, andstartRowparameters are ignored.
-
-
na.strings:Default= "NA". Returns blank cells asNAs. -
fillMergedCells:Default= FALSE. IfTrue, then the value in a merged cell (Parent) is given to all cells within the merge (Child).
Return value
This method returns Excel file data as a data frame.
Example 2
# Installing required libraryinstall.packages("xlsx") # Optional step# using read.xlsx method to read excel filesdata <-read.xlsx('Employee.xlsx')print(data)