Search⌘ K
AI Features

Mix and Match Data

Explore how to fetch and combine related data from multiple tables using SQL joins. Understand primary and foreign keys to maintain data integrity and build reliable queries that reveal meaningful insights from scattered datasets.

In the real world, data rarely lives in one neat table. Users might be stored in one place, orders in another, and products somewhere else entirely. Individually, these tables tell partial stories. But when combined correctly, they reveal how users behave, what’s selling, and where systems might be failing.

As a data engineer, your job is to connect these pieces safely and correctly. That means understanding how tables relate, and using SQL joins to turn scattered datasets into reliable, production-ready data.

Before we jump into joins, we need to understand how tables are structured and connected, starting with primary keys and foreign keys.

Creating tables in SQL

In SQL, a table is like a dataset: rows represent records, and columns represent fields (attributes). As data engineers, you’ll often create or manage these tables when building pipelines or staging raw inputs.

Here’s a simple syntax to define a table:

CREATE TABLE table_name (
column1 datatype,
column2 datatype,
...
);
Define a table

For example, a table of customers:

CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
Name VARCHAR(100),
Email VARCHAR(100)
);
Example table

In this example, CustomerID is marked as a PRIMARY KEY, meaning it will uniquely identify each customer—this is essential when linking this table to others.

Tables are often created by data pipelines, not by hand. But it’s important to know how they work so you can read or debug them when needed.

Understanding primary keys

A primary key uniquely identifies each row in a table. For example, no two customers should share the same CustomerID. The following are key characteristics:

  • Unique: No duplicates allowed.

  • Not null: Every row must have a value.

  • Stable: Should not change frequently.

In data engineering workflows, primary keys help ensure data integrity during joins, filters, and feature engineering.

Fun fact: Think of a primary key like a student roll number in school. Even if two kids have the same name, their roll number keeps things sorted out!

Understanding foreign keys

Now, let’s talk about the foreign Key. If the primary key is like a unique ID for each book in a library, then a foreign key is like a reference to a specific book in the account of a user who has borrowed it. It establishes a connection between two tables by referring to the primary key in another table.

A foreign key is a column (or a set of columns) in one table that refers to the primary key in another table. It creates a relationship between the two tables and ensures referential integrity by enforcing that the value in the foreign key column matches an existing primary key in the related table.

The following are key properties of the foreign keys:

  • Referential integrity: The value of a foreign key in one table must match a primary key in another table or be NULL.

  • Establishes relationships: Foreign keys are used to define relationships between tables, like one-to-many or many-to-many relationships.

CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
OrderDate DATE,
CustomerID INT,
FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);
Foreign key example

In this case, CustomerID in the Orders table is a foreign key that references the CustomerID in the Customers table. This relationship ensures that every order is linked to an existing customer.

Fun fact: A foreign key is like a “friend tag” on social media—it points to someone else’s profile (or in this case, another table).

Now that we understand how tables can be related through primary and foreign keys, let's dive into how we can retrieve and combine data from these tables using SQL joins.

Using SQL joins

Joins allow us to combine data from multiple tables into one result set. Think of them as a way to “merge” datasets based on shared keys.

We’ll use the following example tables:

Customer table

Orders table

CustomerID

Name

OrderID

CustomerID

OrderDate

101

Alice

1

101

2025-04-01

102

Bob

2

102

2025-04-05

103

Carol

3

101

2025-04-10

Let’s break down the most commonly used types of SQL joins.

1. INNER JOIN

The INNER JOIN is one of the most popular join types. It returns only the rows where there is a match between the two tables. If there is no match, the row is excluded from the results.

Venn diagram of inner join
Venn diagram of inner join

Example

Let’s say we want to retrieve a list of customers along with their orders. We will use an INNER JOIN between the Customers and Orders tables, matching them on the CustomerID.

MySQL
SELECT Customers.Name, Orders.OrderID
FROM Customers
INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

Here, the Customers table is used as the main source and joined with the Orders table, keeping only the rows where the CustomerID matches in both tables.

Fun fact: An INNER JOIN is like a group project—only students who show up in both the attendance sheet and the team list get credit!

2. LEFT JOIN

The LEFT JOIN (or LEFT OUTER JOIN) returns all the rows from the left table, along with the matching rows from the right table. If there is no match, the result will contain NULL values for columns from the right table.

Venn diagram of left join
Venn diagram of left join

Example

Let’s say we want a list of all customers and their orders, but this time, we also want to include customers who have not placed any orders. We use the LEFT JOIN:

MySQL
SELECT Customers.Name, Orders.OrderID
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

Here, all customers will appear in the result, even if they haven’t placed any orders. For those customers without orders, the OrderID will be NULL.

Fun fact: A LEFT JOIN is like inviting your whole contact list to a party. If someone doesn’t RSVP (no match), you still keep their name on the list!

3. RIGHT JOIN

The RIGHT JOIN (or RIGHT OUTER JOIN) is the mirror image of the LEFT JOIN. It returns all the rows from the right table and the matching rows from the left table. If there is no match, the result will contain NULL values for columns from the left table.

Venn diagram of right join
Venn diagram of right join

Example

In some cases, we may want to list all orders, including orders without associated customers. The RIGHT JOIN would be useful here:

MySQL
SELECT Customers.Name, Orders.OrderID
FROM Customers
RIGHT JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

This query returns all records from the right table, along with any matching records from the left table. If there’s no match, the left table’s columns will contain NULL values.

Fun fact: Think of RIGHT JOIN as checking who wanted to attend your event—even if they weren’t actually on your invite list.

4. CROSS JOIN

The CROSS JOIN produces a cartesian product—it returns every possible combination of rows from the two tables. If the first table has N rows and the second table has M rows, the result will contain N * M rows.

Venn diagram of cross join
Venn diagram of cross join

Example

In this example, a cross join pairs every customer with every order, producing all possible customer–order combinations.

MySQL
SELECT
c.CustomerID,
c.Name,
o.OrderID,
o.OrderDate
FROM
Customers AS c
CROSS JOIN Orders AS o

This query selects four columns: the customer’s ID and name from the Customers table, and the order ID and date from the Orders table. It uses a CROSS JOIN, which pairs every row from Customers with every row from Orders, creating all possible combinations of customers and orders regardless of whether they’re related.

Fun fact: CROSS JOIN is like making a seating chart where every guest gets paired with every dish. Tasty? Yes. But it gets crowded fast!

5. SELF JOIN

A SELF JOIN is a join where a table is joined to itself. This is useful when we need to compare rows within the same table.

Venn diagram of self join
Venn diagram of self join

Example

In this example, we use a self-join on the Orders table to list every pair of orders placed by the same customer, ensuring the first order ID is always less than the second.

MySQL
SELECT
o1.CustomerID,
c.Name,
o1.OrderID AS FirstOrder,
o2.OrderID AS SecondOrder
FROM
Orders AS o1
JOIN Orders AS o2
ON o1.CustomerID = o2.CustomerID
AND o1.OrderID < o2.OrderID
JOIN Customers AS c
ON c.CustomerID = o1.CustomerID

The self-join will pair up each customer’s orders where the first order ID is less than the second. In our data, only customer 101 (Alice) has more than one order (OrderIDs 1 and 3), so we get a single row. Joining with Customers lets us replace the raw CustomerID with actual customer details (e.g., Name) and ensures only pair orders for valid customers.

Fun fact: A SELF JOIN is like a mirror—you’re comparing a table with itself to find interesting pairs or patterns.

Before we wrap up joins, it’s worth seeing how a SELF JOIN stacks up against other common join types:

Comparing SQL Join types

Join type

What it returns

When to use it

INNER JOIN

Only matching rows in both tables

When you need records that exist in both tables

LEFT JOIN

All rows from the left table + matching rows from the right

When you want everything from the left, even if no match

RIGHT JOIN

All rows from the right table + matching rows from the left

When you want everything from the right, even if no match

CROSS JOIN

Every possible combination of rows

When you need all combinations (careful—can get huge!)

SELF JOIN

A table joined to itself

When comparing rows within the same table

Now that you’ve seen how different joins work, let’s look at some common mistakes developers make when using them.

Common mistakes:

1. Forgetting the ON condition
Without it, your JOIN becomes a CROSS JOIN and can explode your result set. Always double-check your join condition.

2. Using INNER JOIN instead of LEFT JOIN
An INNER JOIN will exclude unmatched rows—fine for clean data, risky for pipelines expecting full data coverage.

3. Mixing up aliases
Using short aliases like c and o makes code cleaner—but mix them up, and you’ll get confusing errors or wrong results.

Summary

SQL Joins are powerful tools that help us combine data from multiple tables, making it possible to retrieve and analyze related data efficiently. Understanding how primary keys and foreign keys form relationships between tables is essential for writing effective queries. With this knowledge, we can build complex queries that draw data from different tables to solve real-world problems.

To reinforce these concepts, test your understanding with a quick quiz.

Technical Quiz
1.

In an SQL query, which SQL keyword is used to specify the column names to be used in a join?

A.

WHERE

B.

HAVING

C.

GROUP BY

D.

ON


1 / 5