Loading External Data
Learn how to pull data from a CSV file into our R environment, which lays the foundation for more complex database pulls.
We'll cover the following...
Without the ability to load external data sources, we need to hardcode our data into R using statements like the ones below:
#Store some survey data in a data frame objectVAR_DataFrame <- data.frame(Q1_Ans = c(1,4,3,5,1,2),Q2_Ans = c(5,3,2,2,5,1),Q3_Ans = c(TRUE,TRUE,FALSE,TRUE,FALSE,FALSE))VAR_DataFrame #Print the resulting data frame
- Lines 2–5: We create a hardcoded data frame, using the
data.framefunction and explicitly coding values into that data frame.
While snippets like these are helpful in particular circumstances, most of the time, we’ll deal with much more extensive, possibly dynamic, datasets, so hardcoding them into our scripts would ruin our efficiency. Most of the time in data science, we’ll pull these larger datasets from other sources—csv files, databases, and websites. Fortunately, R is well-suited to the task of dynamically loading data.
The read.csv function
In base-R, the primary function to pull in data from a csv file is read.csv(). Say we have a csv file called MySurveyData.csv, which we can examine in the code window below.
This data would be frustrating to hardcode into our scripts. And if we re-ran the survey or added new questions to the study, updating our script would be very frustrating. Luckily, we can read the csv ...