Comparing Missing Values
Understand how to properly compare and handle NULL values in SQL for effective data cleaning. Learn why the equality operator does not work with NULLs and how to use IS, IS DISTINCT FROM, and IS NOT DISTINCT FROM to compare missing values safely. This lesson helps you prepare your data accurately for analysis by mastering missing value comparisons.
We'll cover the following...
Overview
Before we start analyzing data, we usually need to clean it first. Data cleaning is an important part of this process, and handling missing values is a large part of that.
Missing data in SQL
Missing data in SQL is represented by the special value NULL. We already know that some functions, such as aggregate functions, handle NULL values differently. For example:
As you can see, the expression COUNT(value) ignored the NULL value in the column value, and the result was 1.
Comparing NULL values
We can't use the equality comparison operator, =, to check if a certain value is NULL. To check if a value is NULL, we need to use the IS operator instead:
When we use the equality operator in the expression NULL = NULL, the result is not True, it is NULL. When we use the IS operator, the result of the expression NULL IS NULL is True.
Using the IS DISTINCT FROM operator
Comparing nullable values and columns can be tricky. If we use the equality operator we may get NULL as a result. To safely compare NULL values for equality, SQL offers the IS DISTINCT FROM operator:
The query illustrates the differences between the equality operator and IS DISTINCT FROM. When a and b are not NULL, the equality and IS DISTINCT FROM are returning the same result. However, when one or both values are NULL, the equality operator evaluates to NULL, but IS DISTINCT FROM returns the result as if we used IS.
Note: When comparing nullable columns, it's safer to use
IS DISTINCT FROMorIS NOT DISTINCT FROM.
The IS DISTINCT FROM operator has an opposite operator IS NOT DISTINCT FROM:
The IS NOT DISTINCT FROM operator is the opposite of IS DISTINCT FROM, or != for non-nullable values.