A database is a collection of structured data stored in an organized manner. PostgreSQL is a relational database management system that employs SQL for fundamental operations such as creating tables, inserting data into tables, modifying the records, and deleting the existing data from the tables. Understanding SQL keywords is crucial for being skilled in SQL. In this context, an important keyword, PERCENT_RANK()
, is a window function that returns the rank of data values as percentiles of the dataset.
PERCENT_RANK()
functionThe PERCENT_RANK()
computes a relative ranking of the value based on the rows in a dataset. It also works with datasets in the partitions. It returns a value ranging from 0
to 1
(inclusive).
The following is the syntax for PERCENT_RANK()
:
PERCENT_RANK ()OVER ( [PARTITION BY clause][ORDER BY clause] )
The PERCENT_RANK()
represents the window function.
We create partitions by defining the column/condition in PARTITION BY clause
. This clause becomes optional if there’s no need to create partitions. The window function will treat the dataset as a single window in such cases.
The ORDER BY clause
specifies the order within each partition.
You may find more details on the PARTITION BY
clause and ORDER BY
clause in the linked Answers.
Let's understand the PERCENT_RANK()
window function through coding examples.
We have some data on the student's exam results in the Exam_result
table that we can view by using the SELECT
query.
SELECT * FROM Exam_result;
Note: We have modified the records a bit in each example to understand the concept clearly.
The PERCENT_RANK()
returns a relative value ranking based on the rows in datasets. It also works with partitions of a dataset.
The return value ranges from 0
to 1
. It assigns 0
to the lowest ranking value(s) and 1
to the highest ranking value, given that the data is sorted in ascending order. On the other hand, it returns 0
for the highest ranking value and 1
for the lowest ranking value if the data is sorted in descending order.
When a row has the value 0.25
, and we're using the ascending order, which means the row contains a value 25% higher than the other rows. For the descending order, it means that the row contains a value 25% lower than the other rows.
SELECT std_id,std_name,marks,subject_name,PERCENT_RANK() OVER(ORDER BY marks ASC)AS MarksPercentRankFROM Exam_result;
Note: Check out the output in descending order by replacing
ASC
withDESC
in theORDER BY
clause in the above code. Also, note that the values are trimmed for this example.
As the return value of the function PERCENT_RANK()
ranges from 0
to 1
, it generates the ranks within the limit for each window of the data.
Let's partition the data based on the subject names. We'll have three partitions according to the available data. Press the "Run" button in the following playground to view the output:
SELECT std_id,std_name,marks,subject_name,PERCENT_RANK() OVER(PARTITION BY subject_nameORDER BY marks ASC) AS MarksPercentRankFROM Exam_result;
Note: Check out the output in descending order by replacing
ASC
withDESC
in theORDER BY
clause in the above code.
Let's talk about how this function deals with the duplicate values. This function assigns 0
to each row if they have more than one occurrence of the lowest ranking values. On the other hand, the highest ranking value has no ties. If we get more than one occurrence of the highest ranking value in a dataset, the function will never assign 1
to any of the rows. Let's see this in the following example:
SELECT std_id,std_name,marks,subject_name,PERCENT_RANK() OVER(PARTITION BY subject_nameORDER BY marks ASC) AS MarksPercentRankFROM Exam_result;
We can see the output sorted in ascending order in the above output. The entries having the lowest ranking values will be assigned the rank 0
.
We may see that the students with the Science
subject have the same rank when there is a tie for the lowest score. Also, this function will not assign 1
to any of the records if there is a tie on the highest score, as we can see for students with the English
subject.
Note: Check out the output in descending order by replacing
ASC
withDESC
in theORDER BY
clause in the above code.
It returns 0
for the NULL
value if we sort the rows in descending order and 1
if we sort the rows in ascending order. Following is the example of the output sorted in ascending order.
SELECT std_id,std_name,marks,subject_name,PERCENT_RANK() OVER(PARTITION BY subject_nameORDER BY marks ASC) AS MarksPercentRankFROM Exam_result;
Following is the example of the output sorted in descending order.
SELECT std_id,std_name,marks,subject_name,PERCENT_RANK() OVER(PARTITION BY subject_nameORDER BY marks DESC) AS MarksPercentRankFROM Exam_result;
For understanding of PERCENT_RANK()
window function in PostgreSQL, we have seen a few examples that use this function. These examples include single datasets, partitioned datasets, datasets with duplicate values, and datasets with NULL values. Through detailed examples showcasing both ascending and descending order scenarios, this answer serves as a practical guide. It also empowers users to effectively leverage percentiles in SQL for diverse data analysis needs.
Free Resources