What is CUME_DIST() in PostgreSQL?

A database is a collection of structured data stored in an organized manner. PostgreSQL is a relational database management system that employs SQL for fundamental operations such as creating tables, inserting data into tables, modifying the records, and deleting the existing data from the tables. Understanding SQL keywords is crucial for being skilled in SQL. In this context, an important keyword,These calculate values for each row in a query result set based on a specified window of rows. CUME_DIST(), is a window functionIt calculates values for each row in a query result set based on a specified window of rows. that returns the cumulative distribution of a value in a group of data values (dataset).

The CUME_DIST() function

The CUME_DIST() computes a cumulative value based on the rows in a dataset. It also works with datasets in the partitions. Its return values range from greater than 0 and less than or equal to 1. The CUME_DIST() function is particularly used in scenarios where you need to understand the relative standing of a value within a dataset, such as finding percentilesA percentile is a value on a scale of 100 that compares the current row's value to the other rows' values in the same dataset. Percentiles in decimal form range from 0 to 1.. Suppose we have student scores in different subjects: CUME_DIST() shows the percentage of students who scored less than or equal to a given score, which is useful for understanding cumulative standing.

Syntax

Following is the syntax for CUME_DIST():

CUME_DIST ()
OVER ( [PARTITION BY clause]
[ORDER BY clause] )
  • The CUME_DIST() represents the window function.

  • We create partitions by defining the column/condition in the PARTITION BY clause. This clause becomes optional if there’s no need to create partitions. In such cases, the window function will treat the dataset as a single window.

  • The ORDER BY clause specifies the order within each partition.

Coding examples

Let’s understand the CUME_DIST() window function through coding examples.

Example table

We have some data on the student’s exam results in the Exam_result table that we can view using the SELECT query.

SELECT * FROM Exam_result;

Note: We have modified the records in each example to understand the concept clearly.

CUME_DIST() of a single dataset

The CUME_DIST() returns a proportion value based on the preceding rows in datasets. It also works with partitions of a dataset. The return value ranges between 0 and 1.

A value of 0.2 for a row in ascending order means that the current row’s value is less than or equal to 20% of the values in the dataset or partition (if applicable). In other words, 20% of the rows have a value less than or equal to the current row’s value. Conversely, in descending order, 0.2 means that the current row's value is greater than or equal to 20% of the values in the dataset or partition (if applicable). Thus, 20% of the rows have a value greater than or equal to the current row’s value.

SELECT std_id,
std_name,
marks,
subject_name,
CUME_DIST() OVER(ORDER BY marks ASC)
AS MarksCumeDist
FROM Exam_result;

Note: Check out the output in descending order by replacing ASC with DESC in the ORDER BY clause in the above code. Also, the values are trimmed for this example.

CUME_DIST() of a partitioned dataset

As the return value of the function CUME_DIST() ranges from 0 to 1, it generates the values within the limit for each window of the data.

Let’s partition the data based on the subject names. We’ll have three partitions according to the available data. Click the “Run” button in the following playground to view the output:

SELECT std_id,
std_name,
marks,
subject_name,
CUME_DIST() OVER(PARTITION BY subject_name
ORDER BY marks ASC) AS MarksCumeDist
FROM Exam_result;

Note: Check out the output in descending order by replacing ASC with DESC in the ORDER BY clause in the above code.

CUME_DIST() for duplicate values

Let’s discuss how CUME_DIST() deals with the duplicate values. If we get more than one occurrence of the highest value in a dataset in ascending order, the function assigns 1 to all the rows. Let’s see this in the following example:

SELECT std_id,
std_name,
marks,
subject_name,
CUME_DIST() OVER(PARTITION BY subject_name
ORDER BY marks ASC) AS MarksCumeDist
FROM Exam_result;

You may see that the students with the Science subject have the same value, 1, when there is a tie for the lowest score. Also, this function assigns 0.4 to all the records if there is a tie on the highest score, as we can see for students with the English subject.

Note: Check out the output in descending order by replacing ASC with DESC in the ORDER BY clause in the above code.

CUME_DIST() for NULL values

The CUME_DIST() function considers NULL values as well. It assigns the lowest value to the record with NULL value if we sort the rows in descending order and 1 if we sort the rows in ascending order. Following is the example of the output sorted in ascending order.

SELECT std_id,
std_name,
marks,
subject_name,
CUME_DIST() OVER(PARTITION BY subject_name
ORDER BY marks ASC) AS MarksCumeDist
FROM Exam_result;

Following is the example of the output sorted in descending order.

SELECT std_id,
std_name,
marks,
subject_name,
CUME_DIST() OVER(PARTITION BY subject_name
ORDER BY marks DESC) AS MarksCumeDist
FROM Exam_result;

CUME_DIST() vs. PERCENT_RANK()

In PostgreSQL, CUME_DIST() and PERCENT_RANK() are similar ranking functions: CUME_DIST() shows the percentage of students who scored less than or equal to a given score, which makes it useful for understanding cumulative standing while PERCENT_RANK() shows the rank of a student’s score relative to others, which makes it useful for comparing individual performance. To discuss the difference between CUME_DIST() and PERCENT_RANK(), we must understand the percentiles first. A percentile is a value on a scale of 100 that compares the current row’s value to the other rows’ values in the same dataset. Percentiles in decimal form range from 0 to 1. Let’s look at the distinct behaviors of CUME_DIST() and PERCENT_RANK() functions.

  • In PostgreSQL, CUME_DIST() calculates the cumulative distribution of a value within a dataset, including the current row, while PERCENT_RANK calculates the relative rank within a dataset, excluding the current row.

  • Both functions CUME_DIST() and PERCENT_RANK() have a maximum value of 1. The highest possible value for both functions is 1, which indicates the highest rank or the full distribution.

  • The function PERCENT_RANK() doesn’t assign 1 to any of the values if we have ties. In contrast, CUME_DIST() assigns the value 1 to all the records with maximum values.

  • The range for PERCENT_RANK() starts from 0, which indicates the lowest possible rank. On the other hand, CUME_DIST() does not start from 0; instead, it assigns a minimum value to the record. This minimum value represents the cumulative distribution of the record within the dataset or partition.

Let’s have a look at the following example for more clarification.

SELECT std_id,
std_name,
marks,
subject_name,
PERCENT_RANK() OVER(PARTITION BY subject_name
ORDER BY marks ASC) AS PercentRank,
CUME_DIST() OVER(PARTITION BY subject_name
ORDER BY marks ASC) AS CumeDist
FROM Exam_result;

Conclusion

To understand the CUME_DIST() window function in PostgreSQL, we have seen a few examples that use this function. These examples include single, partitioned datasets with duplicate values and datasets with NULL values. In the end, we have also compared it with PERCENT_RANK(). Through detailed examples showcasing both ascending and descending order scenarios, this Answer serves as a practical guide. It also empowers users to leverage percentiles effectively in SQL for diverse data analysis needs.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved