Search⌘ K
AI Features

Hello, Data! Meet SQL

Explore foundational SQL skills needed to query relational databases effectively. Understand how to select specific data, filter rows using conditions, and sort or limit results for clearer analysis. This lesson teaches you the core SQL commands used to retrieve and shape data, essential for data science tasks and real-world applications.

Spreadsheets are great, CSV files are common, and APIs stream data from countless sources—but when companies need speed, structure, and scale, they often rely on relational databases. Think of them as powerful digital filing cabinets where data lives in neatly organized tables: rows and columns designed for fast retrieval and analysis.

But tables alone aren’t enough. To explore them, ask questions, and extract insights, we need a universal language: SQL (Structured Query Language).

SQL is not a programming language—it’s a declarative language used to request data from relational databases. It helps you filter, organize, and shape data for analysis, reporting, or machine learning pipelines.

How SQL communicates with databases

SQL is how we talk to relational databases. It’s how we ask for the data we want, filter the noise, and shape results into meaningful answers. Whether you're analyzing employee salaries, tracking customer orders, or prepping data for machine learning models—SQL is a tool you'll reach for again and again.

Fun fact: The first version of SQL was developed at IBM in the 1970s. It originally stood for "Structured English Query Language" (SEQUEL)—which is why some people still pronounce it “sequel” today!

Here’s a simple diagram to show how SQL communicates with a database behind the scenes:

SQL query interaction flow
SQL query interaction flow

This lesson focuses on core SQL building blocks: SELECT, FROM, WHERE, ORDER BY, and LIMIT.

Sample data: Employees table

Let’s start with a simple example table called employees to understand how SQL works. It stores information about employees in a company, including their ID, name, department, hire date, and salary.

EmployeeID

Name

Department

HireDate

Salary

1

Alice Johnson

Engineering

2020-03-15

85,000.00

2

Bob Smith

Marketing

2019-07-22

65,000.00

3

Carol Martinez

Sales

2021-01-08

72,000.00

4

David Liu

Engineering

2018-11-02

95,000.00

5

Eva Gómez

HR

2022-05-16

58,000.00

6

Frank O’Connor

Finance

2017-09-30

78,000.00

7

Grace Patel

Engineering

2023-02-12

80,000.00

8

Hiro Tanaka

Support

2020-12-01

60,000.00

9

Isabella Rossi

Marketing

2021-06-25

67,000.00

10

Arya Stark

Sales

2019-04-10

71,000.00

Selecting data: The SELECT and FROM clauses

Two SQL clauses, SELECT and FROM, are essential when querying a database. They work together to tell the database what data we want and where to find it.

SELECT is how we specify the exact columns—or pieces of information—we want to see in our results. For example, if we only care about employee names and salaries, we tell SQL to select only those columns.

But data doesn’t float in space—it lives inside tables. That’s where FROM comes in. It tells SQL which table to pull the data from. As databases often contain many tables, we must specify the source.

Together, SELECT and FROM answer two key questions:

  • What data do we want? (SELECT)

  • Where do we get it from? (FROM)

Basic syntax

The following is the syntax to use the SELECT and FROM clauses:

SELECT column1, column2, ...
FROM table_name;
  • column1, column2, ...: The columns we want to retrieve

  • table_name: The table where the data resides

SQL
SELECT
Name,
Salary
FROM employees;
  • Lines 1–3: Give us the Name and Salary columns.

  • Line 4: Get that data from the employees table.

When we run this query, the database returns a list of employee names alongside their salaries.

Note: If we want to select all columns from a table, we can use * instead of listing each column name individually. For example, SELECT * FROM employees; retrieves every column in the employees table.

Filtering rows: The WHERE clause

When we're exploring data, we rarely want to look at every row in the table. Instead, we usually have a specific question in mind: Which employees work in engineering? Who earns more than $80,000? Who was hired before 2020? That’s where the WHERE clause becomes essential.

The WHERE clause lets us filter rows so we only get the ones that match certain conditions—giving us control to focus on the most relevant slice of data.

Syntax

The following is the syntax to use WHERE clause:

SELECT column1, column2, ...
FROM table_name
WHERE condition;

This is the basic structure. Here’s what each part does:

  • SELECT: Choose the columns we want to view.

  • FROM: Point to the table we're querying.

  • WHERE: Add a condition that filters the rows.

Example 1: Filter by department

Assume we’re interested in analyzing the Engineering team. Here’s how we’d narrow it down:

SQL
SELECT Name, Department, Salary
FROM employees
WHERE Department = 'Engineering';

This query returns only employees from the Engineering department. It’s a great way to isolate a team before doing deeper analysis—like comparing average salaries or evaluating hiring patterns.

Example 2: Filter by salary

Now let’s say we want to look at employees earning over $80,000—maybe for a compensation analysis.

SQL
SELECT Name, Salary
FROM employees
WHERE Salary > 80000;

This helps us find high earners and begin understanding salary distributions across the organization—essential for tasks like pay equity reviews or anomaly detection.

Example 3: Filter by hire date

If we’re studying employee retention or tenure, we might want to find those who were hired before a certain year:

SQL
SELECT Name, HireDate
FROM employees
WHERE HireDate < '2020-01-01';

This focuses our query on longer-serving employees—useful for cohort analysis or calculating average tenure.

The WHERE clause is our filtering lens. It lets us zoom in on the data that matters most to our analysis—whether we're exploring trends, building models, or preparing dashboards. It’s one of the most frequently used tools in any data scientist’s SQL toolkit.

Next, we’ll see how to combine multiple conditions to ask more nuanced questions.

Combine and conquer: Filtering with multiple conditions

As data scientists, we often want to ask layered questions. Maybe we want to find engineers hired after a certain date and earning above a certain salary. To do this, we can combine multiple conditions using logical operators like AND, OR, and parentheses ().

This approach helps us narrow down rows even further and add nuance to our data exploration.

Syntax

The following is the syntax to use AND, OR, and parentheses ().

SELECT column1, column2, ...
FROM table_name
WHERE condition1 AND condition2;
-- OR
SELECT column1, column2, ...
FROM table_name
WHERE condition1 OR condition2;
-- Use parentheses for clarity
SELECT ...
FROM ...
WHERE (condition1 OR condition2) AND condition3;
Syntax for using operator to combine conditions

Example 1: Use AND for precise filters

Let’s say we want employees in the Engineering department who also earn more than $80,000:

SQL
SELECT Name, Department, Salary
FROM employees
WHERE Department = 'Engineering' AND Salary > 80000;

This filters the dataset down to employees who satisfy both conditions. This is helpful when we’re segmenting specific roles for performance or budget analysis.

Example 2: Use OR for flexible filters

Now imagine we’re interested in employees from either Marketing or Sales—perhaps to compare how customer-facing teams are compensated.

SQL
SELECT Name, Department, Salary
FROM employees
WHERE Department = 'Marketing' OR Department = 'Sales';

The OR keyword expands our filter to include multiple groups—ideal for combining related departments, regions, or project teams.

Logical conditions give us a powerful way to sharpen our filters and ask more complex, data-driven questions. Whether we're profiling talent, analyzing churn risk, or preparing data for modeling, understanding how to layer conditions is a must-have skill.

Shape the results: Sorting and limiting rows

Once we’ve filtered our data, the next step is to present it in a way that’s clear and useful. As data scientists, we often need to answer questions like:

  • Who are the top earners in the company?

  • Which departments hired most recently?

  • What’s the latest record in our dataset?

SQL gives us two simple tools to shape our result set: ORDER BY for sorting, and LIMIT for restricting how many rows we see.

Sorting with ORDER BY

The ORDER BY clause lets us sort rows by one or more columns. By default, it sorts in ascending order, but we can use DESC for descending.

SQL
SELECT Name, Salary
FROM employees
ORDER BY Salary DESC;

This query gives us the highest-paid employees first. Sorting is especially helpful when identifying trends, ranking metrics, or preparing data for visualization.

We can also sort by multiple columns:

SQL
SELECT Name, Department, HireDate
FROM employees
ORDER BY Department ASC, HireDate DESC;

Here, we group by department and show the most recently hired employees within each one.

Limiting rows with LIMIT

Sometimes we don’t need every record—just the top few. That’s where LIMIT comes in handy:

SQL
SELECT Name, Salary
FROM employees
ORDER BY Salary DESC
LIMIT 3;

This returns only the top three earners. It’s perfect for dashboards, summaries, or performance snapshots.

Combining ORDER BY and LIMIT helps us make results more digestible and relevant—whether we’re scanning for outliers, surfacing highlights, or building fast queries for large datasets.

Conclusion

As data scientists, understanding SQL is essential for retrieving the right information from relational databases. In this lesson, we explored how to use SELECT, FROM, WHERE, ORDER BY, and LIMIT—the core building blocks for querying structured data. These tools help us pull, filter, and organize data so we can turn it into actionable insight. Mastering these basics is the first step toward writing more advanced and powerful queries.

Let’s wrap things up with a quick quiz to check your understanding of the key concepts we’ve covered.

Technical Quiz
1.

Which SQL keyword is used to specify the table to retrieve data?

A.

GET

B.

SELECT

C.

FROM

D.

TABLE


1 / 5