Patch Gaps

Learn about the Patch gaps pattern to handle missing or NULL data.

When managing real-world data, we often run into incomplete information, missing values, skipped dates, or NULL fields that break our reports. Imagine we’re building a daily sales report. What happens if no orders were placed on a given day?

The date disappears from our chart, making trends hard to read. Or let’s say we’re tracking customer activity, but some users are missing their LastLogin or LastPurchaseDate. These gaps can mislead our analysis.

That's where the Patch gaps pattern comes in. In this lesson, we will learn how to handle missing or NULL data by filling in the blanks or generating the missing records. We will explore different types of gaps: missing rows, missing dates, and NULL fields, and learn strategies to patch them effectively.

By the end of this lesson, we’ll be able to:

  • Detect and handle NULL values using standard SQL functions.

  • Generate sequences to fill in missing time intervals or identifiers.

  • Join data with generated ranges to ensure we never miss a data point.

  • Use this pattern to build reliable reports and consistent datasets.

Pattern overview

Category:

  • Filtering Patterns

Intent:

To fill in missing values or generate absent records so that the data is complete for accurate analysis, reporting, or visualization.

Motivation:

Real-world data often contains gaps, NULL values, missing dates, or absent records. If left unpatched, these gaps can distort analytics and lead to incorrect conclusions. For instance, a sales chart might suggest a dip in performance when a date is simply missing from the dataset.

This pattern helps us detect and fill those holes, making our data analysis more truthful and actionable.

Also known as:

  • Data imputation

  • Gap filling

  • NULL handling

  • Time series completion

Structure

There are two main structures used in this pattern:

  1. Patch NULLs using COALESCE or CASE

SELECT COALESCE(<column>, <default_value>) AS filled_value
FROM <table>;
Syntax to patch NULLs
  1. Generate missing rows using a date/calendar table or CTE

WITH DateRange AS (...)
SELECT ...
FROM DateRange
LEFT JOIN ActualData ...
Syntax to generate missing rows

Keywords

COALESCE, IFNULL, CASE, IS NULL, LEFT JOIN, CTE, ROW_NUMBER, SEQUENCE, DATE, INTERVAL, Calendar Table

Problem structure

We use the Patch Gaps pattern when:

  • We need to identify and fill in gaps in our data, such as NULL values or missing rows.

  • We first identify where the gaps are (e.g., NULL values or missing records).

  • Then, we choose an appropriate strategy to patch the gaps:

    • Use COALESCE or CASE to handle NULLs.

    • Generate a sequence (e.g., a date range) to fill in missing records.

  • Finally, we join the generated data with the actual data to fill or replace the gaps.

  • Look for keywords like: “fill missing,” “patch gaps,” “replace NULLs,” “handle missing data,” or “complete records” to identify when this pattern applies.

Example use cases

1. Filling NULL LastLogin Dates

Given the following structure of the Customers table:

Field

Type

CustomerID

int

CustomerName

varchar(50)

Email

varchar(50)

Phone

varchar(15)

Address

varchar(100)

LastLogin

date

CreatedAt

timestamp

CustomerTier

enum('New','Regular','VIP')

ReferralID

int

LoyaltyPoints

int

LastPurchaseDate

date

IsChurnRisk

tinyint(1)

Where CustomerID is the primary key and ReferralID is self-referencing to CustomerID. The table contains information about customers. In the table, some customers have never logged in and therefore have a NULL value in the LastLogin column.

Write an SQL query to return the following:

  • CustomerID

  • CustomerName

  • LoginDate: The date of the last login, or '2025-01-01' if the value is NULL

Press + to interact
SELECT CustomerID,
CustomerName,
COALESCE(LastLogin, '2025-01-01') AS LoginDate
FROM Customers;

2. Generate a daily sales report (even on days with no orders).

Given the following structure of the Orders table:

Field

Type

OrderID

int

CustomerID

int

OrderDate

date

TotalAmount

decimal(10,2)

ShippedDate

date

DeliveryStatus

enum('Pending','Shipped','Delivered','Cancelled')

EmployeeID

int

CreatedAt

timestamp

ExpectedDeliveryDate

date

ActualDeliveryDate

date

LateDelivery

tinyint(1)

PaymentMethod

enum('Credit Card','PayPal','Cash on Delivery')

ReturnCount

int

FraudRisk

tinyint(1)

Where OrderID is the primary key and ...