Hello, Data! Meet SQL
Explore foundational SQL skills needed to query relational databases effectively. Understand how to select specific data, filter rows using conditions, and sort or limit results for clearer analysis. This lesson teaches you the core SQL commands used to retrieve and shape data, essential for data science tasks and real-world applications.
Spreadsheets are great, CSV files are common, and APIs stream data from countless sources—but when companies need speed, structure, and scale, they often rely on relational databases. Think of them as powerful digital filing cabinets where data lives in neatly organized tables: rows and columns designed for fast retrieval and analysis.
But tables alone aren’t enough. To explore them, ask questions, and extract insights, we need a universal language: SQL (Structured Query Language).
SQL is not a programming language—it’s a declarative language used to request data from relational databases. It helps you filter, organize, and shape data for analysis, reporting, or machine learning pipelines.
How SQL communicates with databases
SQL is how we talk to relational databases. It’s how we ask for the data we want, filter the noise, and shape results into meaningful answers. Whether you're analyzing employee salaries, tracking customer orders, or prepping data for machine learning models—SQL is a tool you'll reach for again and again.
Fun fact: The first version of SQL was developed at IBM in the 1970s. It originally stood for "Structured English Query Language" (SEQUEL)—which is why some people still pronounce it “sequel” today!
Here’s a simple diagram to show how SQL communicates with a database behind the scenes:
This lesson focuses on core SQL building blocks: SELECT, FROM, WHERE, ORDER BY, and LIMIT.
Sample data: Employees table
Let’s start with a simple example table called employees to understand how SQL works. It stores information about employees in a company, including their ID, name, department, hire date, and salary.
EmployeeID | Name | Department | HireDate | Salary |
1 | Alice Johnson | Engineering | 2020-03-15 | 85,000.00 |
2 | Bob Smith | Marketing | 2019-07-22 | 65,000.00 |
3 | Carol Martinez | Sales | 2021-01-08 | 72,000.00 |
4 | David Liu | Engineering | 2018-11-02 | 95,000.00 |
5 | Eva Gómez | HR | 2022-05-16 | 58,000.00 |
6 | Frank O’Connor | Finance | 2017-09-30 | 78,000.00 |
7 | Grace Patel | Engineering | 2023-02-12 | 80,000.00 |
8 | Hiro Tanaka | Support | 2020-12-01 | 60,000.00 |
9 | Isabella Rossi | Marketing | 2021-06-25 | 67,000.00 |
10 | Arya Stark | Sales | 2019-04-10 | 71,000.00 |
Selecting data: The SELECT and FROM clauses
Two SQL clauses, SELECT and FROM, are essential when querying a database. They work together to tell the database what data we want and where to find it.
SELECT is how we specify the exact columns—or pieces of information—we want to see in our results. For example, if we only care about employee names and salaries, we tell SQL to select only those columns.
But data doesn’t float in space—it lives inside tables. That’s where FROM comes in. It tells SQL which table to pull the data from. As databases often contain many tables, we must specify the source.
Together, SELECT and FROM answer two key questions:
What data do we want? (
SELECT)Where do we get it from? (
FROM)
Basic syntax
The following is the syntax to use the SELECT and FROM clauses:
SELECT column1, column2, ...FROM table_name;
column1, column2, ...: The columns we want to retrievetable_name: The table where the data resides
Lines 1–3: Give us the
NameandSalarycolumns.Line 4: Get that data from the
employeestable.
When we run this query, the database returns a list of employee names alongside their salaries.
Note: If we want to select all columns from a table, we can use
*instead of listing each column name individually. For example,SELECT * FROM employees;retrieves every column in theemployeestable.
Filtering rows: The WHERE clause
When we're exploring data, we rarely want to look at every row in the table. Instead, we usually have a specific question in mind: Which employees work in engineering? Who earns more than $80,000? Who was hired before 2020? That’s where the WHERE clause becomes essential.
The WHERE clause lets us filter rows so we only get the ones that match certain conditions—giving us control to focus on the most relevant slice of data.
Syntax
The following is the syntax to use WHERE clause:
SELECT column1, column2, ...FROM table_nameWHERE condition;
This is the basic structure. Here’s what each part does:
SELECT: Choose the columns we want to view.FROM: Point to the table we're querying.WHERE: Add a condition that filters the rows.
Example 1: Filter by department
Assume we’re interested in analyzing the Engineering team. Here’s how we’d narrow it down:
This query returns only employees from the Engineering department. It’s a great way to isolate a team before doing deeper analysis—like comparing average salaries or evaluating hiring patterns.
Example 2: Filter by salary
Now let’s say we want to look at employees earning over $80,000—maybe for a compensation analysis.
This helps us find high earners and begin understanding salary distributions across the organization—essential for tasks like pay equity reviews or anomaly detection.
Example 3: Filter by hire date
If we’re studying employee retention or tenure, we might want to find those who were hired before a certain year:
This focuses our query on longer-serving employees—useful for cohort analysis or calculating average tenure.
The WHERE clause is our filtering lens. It lets us zoom in on the data that matters most to our analysis—whether we're exploring trends, building models, or preparing dashboards. It’s one of the most frequently used tools in any data scientist’s SQL toolkit.
Next, we’ll see how to combine multiple conditions to ask more nuanced questions.
Combine and conquer: Filtering with multiple conditions
As data scientists, we often want to ask layered questions. Maybe we want to find engineers hired after a certain date and earning above a certain salary. To do this, we can combine multiple conditions using logical operators like AND, OR, and parentheses ().
This approach helps us narrow down rows even further and add nuance to our data exploration.
Syntax
The following is the syntax to use AND, OR, and parentheses ().
SELECT column1, column2, ...FROM table_nameWHERE condition1 AND condition2;-- ORSELECT column1, column2, ...FROM table_nameWHERE condition1 OR condition2;-- Use parentheses for claritySELECT ...FROM ...WHERE (condition1 OR condition2) AND condition3;
Example 1: Use AND for precise filters
Let’s say we want employees in the Engineering department who also earn more than $80,000:
This filters the dataset down to employees who satisfy both conditions. This is helpful when we’re segmenting specific roles for performance or budget analysis.
Example 2: Use OR for flexible filters
Now imagine we’re interested in employees from either Marketing or Sales—perhaps to compare how customer-facing teams are compensated.
The OR keyword expands our filter to include multiple groups—ideal for combining related departments, regions, or project teams.
Logical conditions give us a powerful way to sharpen our filters and ask more complex, data-driven questions. Whether we're profiling talent, analyzing churn risk, or preparing data for modeling, understanding how to layer conditions is a must-have skill.
Shape the results: Sorting and limiting rows
Once we’ve filtered our data, the next step is to present it in a way that’s clear and useful. As data scientists, we often need to answer questions like:
Who are the top earners in the company?
Which departments hired most recently?
What’s the latest record in our dataset?
SQL gives us two simple tools to shape our result set: ORDER BY for sorting, and LIMIT for restricting how many rows we see.
Sorting with ORDER BY
The ORDER BY clause lets us sort rows by one or more columns. By default, it sorts in ascending order, but we can use DESC for descending.
This query gives us the highest-paid employees first. Sorting is especially helpful when identifying trends, ranking metrics, or preparing data for visualization.
We can also sort by multiple columns:
Here, we group by department and show the most recently hired employees within each one.
Limiting rows with LIMIT
Sometimes we don’t need every record—just the top few. That’s where LIMIT comes in handy:
This returns only the top three earners. It’s perfect for dashboards, summaries, or performance snapshots.
Combining ORDER BY and LIMIT helps us make results more digestible and relevant—whether we’re scanning for outliers, surfacing highlights, or building fast queries for large datasets.
Conclusion
As data scientists, understanding SQL is essential for retrieving the right information from relational databases. In this lesson, we explored how to use SELECT, FROM, WHERE, ORDER BY, and LIMIT—the core building blocks for querying structured data. These tools help us pull, filter, and organize data so we can turn it into actionable insight. Mastering these basics is the first step toward writing more advanced and powerful queries.
Let’s wrap things up with a quick quiz to check your understanding of the key concepts we’ve covered.
Which SQL keyword is used to specify the table to retrieve data?
GET
SELECT
FROM
TABLE