What is PERCENT_RANK() in PostgreSQL?

A database is a collection of structured data stored in an organized manner. PostgreSQL is a relational database management system that employs SQL for fundamental operations such as creating tables, inserting data into tables, modifying the records, and deleting the existing data from the tables. Understanding SQL keywords is crucial for being skilled in SQL. In this context, an important keyword, PERCENT_RANK(), is a window function that returns the rank of data values as percentiles of the dataset.

The `PERCENT_RANK()` function

The PERCENT_RANK() computes a relative ranking of the value based on the rows in a dataset. It also works with datasets in the partitions. It returns a value ranging from 0 to 1 (inclusive).

Syntax

The following is the syntax for PERCENT_RANK():

The PERCENT_RANK() represents the window function.
We create partitions by defining the column/condition in PARTITION BY clause. This clause becomes optional if there’s no need to create partitions. The window function will treat the dataset as a single window in such cases.
The ORDER BY clause specifies the order within each partition.

You may find more details on the PARTITION BY clause and ORDER BY clause in the linked Answers.

Coding example

Let's understand the PERCENT_RANK() window function through coding examples.

Example table

We have some data on the student's exam results in the Exam_result table that we can view by using the SELECT query.

Note: We have modified the records a bit in each example to understand the concept clearly.

SQL percentile for a single dataset

The PERCENT_RANK() returns a relative value ranking based on the rows in datasets. It also works with partitions of a dataset.

The return value ranges from 0 to 1. It assigns 0 to the lowest ranking value(s) and 1 to the highest ranking value, given that the data is sorted in ascending order. On the other hand, it returns 0 for the highest ranking value and 1 for the lowest ranking value if the data is sorted in descending order.

When a row has the value 0.25, and we're using the ascending order, which means the row contains a value 25% higher than the other rows. For the descending order, it means that the row contains a value 25% lower than the other rows.

Note: Check out the output in descending order by replacing ASC with DESC in the ORDER BY clause in the above code. Also, note that the values are trimmed for this example.

SQL percentile of a partitioned dataset

As the return value of the function PERCENT_RANK() ranges from 0 to 1, it generates the ranks within the limit for each window of the data.

Let's partition the data based on the subject names. We'll have three partitions according to the available data. Press the "Run" button in the following playground to view the output:

Note: Check out the output in descending order by replacing ASC with DESC in the ORDER BY clause in the above code.

SQL percentile for duplicate values

Let's talk about how this function deals with the duplicate values. This function assigns 0 to each row if they have more than one occurrence of the lowest ranking values. On the other hand, the highest ranking value has no ties. If we get more than one occurrence of the highest ranking value in a dataset, the function will never assign 1 to any of the rows. Let's see this in the following example:

We can see the output sorted in ascending order in the above output. The entries having the lowest ranking values will be assigned the rank 0.
We may see that the students with the Science subject have the same rank when there is a tie for the lowest score. Also, this function will not assign 1 to any of the records if there is a tie on the highest score, as we can see for students with the English subject.

Note: Check out the output in descending order by replacing ASC with DESC in the ORDER BY clause in the above code.

SQL percentile for NULL values

It returns 0 for the NULL value if we sort the rows in descending order and 1 if we sort the rows in ascending order. Following is the example of the output sorted in ascending order.