Search⌘ K
AI Features

Mix and Match Data

Explore how to use SQL joins to combine multiple tables by leveraging primary and foreign keys. Learn key join types like INNER, LEFT, RIGHT, CROSS, and SELF joins, and understand common mistakes to avoid. This lesson helps you connect data from different sources to draw comprehensive insights.

In the real world, data rarely lives in one neat table. Users might be stored in one place, orders in another, and products somewhere else entirely. Individually, these tables tell partial stories. But when combined correctly, they reveal how users behave, what’s selling, and where systems might be failing.

As a data scientist, your job is to connect these pieces safely and correctly. That means understanding how tables relate, and using SQL joins to turn scattered datasets into reliable, production-ready data.

Before we jump into joins, we need to understand how tables are structured and connected, starting with primary keys and foreign keys.

Creating tables in SQL

In SQL, a table is like a dataset: rows represent records, and columns represent fields (attributes). As data scientists, we may not always create these tables ourselves, but understanding how they’re built helps us query them effectively.

Here’s a simple syntax to define a table:

CREATE TABLE table_name (
column1 datatype,
column2 datatype,
...
);
Define a table

For example, a table of customers:

CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
Name VARCHAR(100),
Email VARCHAR(100)
);
Example table

In this example, CustomerID is marked as a PRIMARY KEY, meaning it will uniquely identify each customer—this is essential when linking this table to others.

Tables are often created by data pipelines, not by hand. But it’s important to know how they work so you can read or debug them when needed.

Understanding primary keys

A primary key uniquely identifies each row in a table. For example, no two customers should share the same CustomerID. The following are key characteristics:

  • Unique: No duplicates allowed.

  • Not null: Every row must have a value.

  • Stable: Should not change frequently.

In data science workflows, primary keys help ensure data integrity during joins, filters, and feature engineering. ...