The R language has a package named the sqldf to work with the SQL databases. This package allows the SQL operations to be performed within R code, just as they could be performed in a SQL database. While the sqldf package enables us to execute SQL queries, it’s important to note that these queries are solely conducted on the R data frame. We don’t need a database connection for these queries.
The sqldf package can be installed using the following script:
install.packages("sqldf")
We need to import the sqldf module into our code to perform the SQL functions using R. This is done using the following command:
library(sqldf)
sqldf The sqldf primarily focuses on performing SELECT, JOIN, GROUP BY, and other SQL statements. These operations are used to retrieve, filter, transform, or summarize data. These queries work on the data stored in R data frames.
Let’s look at how to perform basic SQL queries in R:
# Import without showing any warning or messagesoptions(warn = -1)suppressMessages(library(sqldf))# Create a sample dataframedata <- data.frame(Name = c("Alice", "Bob", "Charlie"),Age = c(25, 30, 22),City = c("New York", "San Francisco", "Los Angeles"))# Example 1: Select all columns for people older than 25result1 <- sqldf("SELECT * FROM data WHERE Age > 25")# Example 2: Calculate the average ageresult2 <- sqldf("SELECT AVG(Age) AS AverageAge FROM data")# Example 3: Group data by city and calculate the average age for each cityresult3 <- sqldf("SELECT City, AVG(Age) AS AverageAge FROM data GROUP BY City")# Example 4: Sort data by age in descending orderresult4 <- sqldf("SELECT * FROM data ORDER BY Age DESC")# Example 5: Select names that start with 'A'result5 <- sqldf("SELECT * FROM data WHERE Name LIKE 'A%'")# Displaying resultscat("People older than 25 \n")print(result1)cat("\n\n Average age of the group \n" )print(result2)cat("\n\n Average age by city \n")print(result3)cat("\n\n Data sorted by age \n")print(result4)cat("\n\n Names starting with 'A' \n")print(result5)
Lines 2–3: We import the sqldf package while suppressing any warnings or additional messages for a clean output.
Lines 6–10: We define a data frame in R that has attributes the Name, Age and City.
Line 13: We write an SQL query to retrieve data instances where the Age is greater than 25.
Line 16: We write an SQL query to find the average Age from the data frame.
Line 19: We define an SQL query to group data on the City, and find average the Age in each city.
Line 22: We write an SQL query to sort data by Age in descending order.
Line 25: We write an SQL query to retrieve the Name starting from the alphabet “A.”
Lines 28–37: We display the results of all the above queries with a little formatting for better readability.
Free Resources