Join Variants

Learn about advanced join techniques, such as self-joins, anti-joins, and non-equi joins, to handle complex data relationships and comparisons in SQL.

In the world of data analysis, we often need to compare records, link them together in creative ways, or identify things that do not have a match.

Imagine we are building a product alert system: we want to notify managers if any products have not been restocked for over a month and have no suppliers listed. A simple join will not do; we need more nuanced ways to connect and contrast data.

That is where Join variants come in. In this lesson, we’ll explore advanced join patterns such as self-joins, anti-joins, and non-equi joins.

These techniques expand our problem-solving toolkit when working with complex data relationships.

By the end of this lesson, we’ll be able to:

  • Identify situations where advanced joins are necessary.

  • Use self-joins to compare rows within the same table.

  • Use anti-joins to find non-matching rows between tables.

  • Apply non-equi joins to compare rows using inequality conditions.

Pattern overview

Category:

  • Comparison Patterns

Intent:

To connect rows in non-standard ways by comparing values within the same table, excluding matches, or using inequality conditions.

Motivation:

Standard INNER JOIN and LEFT JOIN operations are powerful, but they don’t always get the job done. When we need to find unmatched records, relate rows within the same table, or compare ranges, we turn to join variants.

Also known as:

  • Self join: Reflexive join

  • Anti join: Exclusion join

  • Non-equi join: Inequality join

Structure

1. Self join: Used to relate a table to itself. This is helpful when rows are connected by a parent-child, referral, or comparison relationship.

Given the following structure of the Customers table:

Field

Type

CustomerID

int

CustomerName

varchar(50)

Email

varchar(50)

Phone

varchar(15)

Address

varchar(100)

LastLogin

date

CreatedAt

timestamp

CustomerTier

enum('New','Regular','VIP')

ReferralID

int

LoyaltyPoints

int

LastPurchaseDate

date

IsChurnRisk

tinyint(1)

Where CustomerID is the primary key and ReferralID is self-referencing to CustomerID. The table contains information about customers. Write an SQL query to show all referral relationships. Return the CustomerName of the referrer and the CustomerName of the referred customer.

Press + to interact
SELECT c1.CustomerName AS Referrer,
c2.CustomerName AS Referred
FROM Customers c1
JOIN Customers c2 ON c1.CustomerID = c2.ReferralID;

2. Anti join: Used to find records in one table not matched by another. Typically implemented using LEFT JOIN ... WHERE other_table.column IS NULL or NOT EXISTS.

Given the following structure of the Products table:

Field

Type

ProductID

int

ProductName

varchar(50)

CategoryID

int

Price

decimal(10,2)

Stock

int

LastRestockDate

date

MonthlySales

int

InventoryTurnoverRate

decimal(5,2)

Where ProductID is the primary key and CategoryID is the foreign key from the Categories table. The table contains information about products.

Also, the following structure of the Product_Suppliers table:

Field

Type

ProductID

int

SupplierID

int

Where ProductID is the foreign key from Products table and SupplierID is the foreign key from the Suppliers table. The table contains information about suppliers and their products.

Write an SQL query to find the names of all products that do not have any suppliers assigned.

Return the ProductName of each product.

Press + to interact
SELECT p.ProductName
FROM Products p
LEFT JOIN Product_Suppliers ps ON p.ProductID = ps.ProductID
WHERE ps.SupplierID IS NULL;

3. Non-Equi Join: Used when the join condition involves inequality (<, >, <=, >=) instead of =. Often applied in range-based joins.

Given the following structure of the Orders table:

Field

Type

OrderID

int

CustomerID

int

OrderDate

date

TotalAmount

decimal(10,2)

ShippedDate

date

DeliveryStatus

enum('Pending','Shipped','Delivered','Cancelled')

EmployeeID

int

CreatedAt

timestamp

ExpectedDeliveryDate

date

ActualDeliveryDate

date

LateDelivery

tinyint(1)

PaymentMethod

enum('Credit Card','PayPal','Cash on Delivery')

ReturnCount

int

FraudRisk

tinyint(1)

Where OrderID is the primary key and CustomerID is the foreign key from the Customers table referring to the customer who placed the order. The table contains information about the orders placed.

Also the following structure of the Suppliers table:

Field

Type

SupplierID

int

SupplierName

varchar(50)

Email

varchar(50)

Phone

varchar(15)

Address

varchar(100)

OnTimeDeliveryRate

decimal(5,2)

AvgLeadTime

int

Where SupplierID is the primary key. The table contains information about suppliers of products.

Write an SQL query to list the OrderID, OrderDate, and SupplierName for each order-product-supplier combination where the average supplier lead time allows the product to be delivered on or before the expected delivery date.

Press + to interact
SELECT o.OrderID, o.OrderDate, s.SupplierName
FROM Orders o
JOIN Suppliers s ON o.ExpectedDeliveryDate - s.AvgLeadTime <= o.OrderDate;

Keywords

SELF JOIN, LEFT JOIN, NOT EXISTS, IS NULL, !=, <, >, ON, aliasing (table1.column ...