Loading External Data
Explore how to import external data sources into R efficiently using the read.csv function. Learn to set parameters such as header, row.names, and stringsAsFactors to handle various data formats. Discover best practices like using head and summary functions to check data integrity after loading. This lesson equips you with essential skills to work with dynamic datasets beyond hardcoded values, preparing you for real-world data science tasks in R.
We'll cover the following...
Without the ability to load external data sources, we need to hardcode our data into R using statements like the ones below:
- Lines 2–5: We create a hardcoded data frame, using the
data.framefunction and explicitly coding values into that data frame.
While snippets like these are helpful in particular circumstances, most of the time, we’ll deal with much more extensive, possibly dynamic, datasets, so hardcoding them into our scripts would ruin our efficiency. Most of the time in data science, we’ll pull these larger datasets from other sources—csv files, databases, and websites. Fortunately, R is well-suited to the task of dynamically loading data.
The read.csv function
In base-R, the primary function to pull in data from a csv file is read.csv(). Say we have a csv file called MySurveyData.csv, which we can examine in the code window below.
This data would be frustrating to hardcode into our scripts. And if we re-ran the survey or added new questions to the study, updating our script would be very frustrating. Luckily, we can read the csv ...