Search⌘ K
AI Features

Describing a Categorical Series

Explore techniques to describe categorical data series using SQL. Understand how to count distinct values, identify the most common categories using the MODE function, and manage missing values represented as NULLs. This lesson helps you accurately analyze categorical data by applying aggregate functions and handling potential data inconsistencies.

Overview

Categorical values are taken from a limited set of known values. Common examples are user, product, category, and so on.

To demonstrate, we are going to use the following list of values:

PostgreSQL
SELECT * FROM (VALUES ('orange'), ('apple'), ('banana'), ('apple')) AS t(fruit);

Counting distinct values

To count the number of rows, we can use the COUNT function as we did before:

PostgreSQL
SELECT
COUNT(*)
FROM (
VALUES ('orange'), ('apple'), ('banana'), ('apple')
) AS t(fruit);

Using COUNT(*), we found that the series has 4 rows. However, our series can have duplicate values. If we want to know how many unique values we have in the series, we can use the DISTINCT ...