What is the PARTITION BY clause in PostgreSQL?
A database is a collection of structured data stored in an organized manner. Using PostgreSQL, we can perform multiple operations, such as creating tables, inserting data into tables, modifying records, and deleting the existing data from tables. In this Answer, we’ll learn the PARTITION BY clause in PostgreSQL.
The PARTITION BY clause
A PARTITION BY clause is used in window functions, which allows us to divide the result set of a query into partitions to perform calculations on each partition separately. It is specifically used in conjunction with window functions and requires the OVER() clause to define the window or frame over which the window function operates. The syntax for the PARTITION BY clause is as follows:
SELECT column1,column2,window_function(expression) OVER (PARTITION BY partition_expressionORDER BY sort_expressionframe_specification) AS result_columnFROM your_table;
Explanation
Line 1: The
SELECTclause specifies the columns we want to retrieve in the result set.Line 3: The
window_function(expression)represents the specific window function we’re using (e.g.,SUM,AVG), along with the expression we want to calculate.Line 4: The
PARTITION BY partition_expressiondefines the column by which we want to create partitions.Line 5: The
ORDER BY sort_expressionis optional and specifies the order within each partition.Line 6: The
frame_specificationis optional and allows us to further refine the window frame.Line 7: The query uses the alias
result_columnusing theASkeyword.Line 8: The query will take data from the table written in place of
your_table.
Let’s have a look at the following section, where the concept is illustrated through example.
Example
We’ll cover two topics in this section, including:
Set up the database
Try it yourself
Set up the database
We use a simple database named Employees having four columns, emp_id, emp_name, dept, and date_of_joining.
--Creating Employees tableCREATE TABLE IF NOT EXISTS Employees (emp_id SERIAL PRIMARY KEY,emp_name TEXT,dept TEXT,date_of_joining DATE);--Inserting data into Employees tableINSERT INTO Employees (emp_name, dept, date_of_joining)VALUES('Usama Khan', 'HR', '2022-01-01'),('Fatima Ahmed', 'Sales', '2022-02-15'),('Hassan Ali', 'IT', '2022-09-20'),('Saira Khan', 'IT', '2022-06-15'),('Ahmad Malik', 'HR', '2022-07-10'),('Bilal Ahmed', 'HR', '2023-01-20'),('Noor Fatima', 'Sales', '2023-02-15'),('Zainab Khan', 'Sales', '2022-08-25'),('Sana Khan', 'HR', '2022-10-15'),('Aamir Malik', 'Sales', '2022-11-30'),('Amna Raza', 'IT', '2022-12-25'),('Ali Hassan', 'IT', '2022-03-10'),('Ayesha Siddiqui', 'HR', '2022-04-05'),('Usman Ahmed', 'Sales', '2022-05-20'),('Imran Malik', 'IT', '2023-03-10');SELECT * FROM Employees;
Code explanation
Lines 1–7: We create a table named
Employeeswith four columns: theemp_id( ,serialSERIAL is a keyword used to create an auto-incrementing integer column. When you define a column as SERIAL, PostgreSQL automatically generates a unique integer value for each new row inserted into the table. This is commonly used for primary key columns to ensure that each row has a unique identifier. primary key),emp_name,dept, anddate_of_joining.Lines 9–26: We insert a few records into the
Employeestable using theINSERT INTOcommand.Line 28: We display the data using the
SELECTstatement.
Try it yourself
To rank employees within each department based on their joining date, we can use ROW_NUMBER()PARTITION BY clause.
SELECT emp_name,dept,date_of_joining,ROW_NUMBER() OVER (PARTITION BY deptORDER BY date_of_joining) AS rank_within_deptFROM Employees;
Code explanation
Lines 1–3: The
SELECTstatement readsemp_name,dept, anddate_of_joining.Line 4: The
ROW_NUMBER()function assigns a unique integer value to each row within each partition, starting from 1 and incrementing by 1 for each subsequent row.Lines 5–6: The
PARTITION BYclause is used to create partitions or groups within the result set based on the values in thedeptcolumn. TheORDER BYclause orders the rows within each department based on thedate_of_joiningcolumn in ascending order.Line 7: The result of this calculation is named
rank_within_dept.Line 8: The table being used in terms of data is
Employees.
Conclusion
The PARTITION BY clause is a key component of PostgreSQL that enables vital operations like performing calculations on each partition separately after the result set has been divided by using any window function. We can also sort the data by using the ORDER BY clause with the PARTITION BY clause within the same window function. It’s an essential skill as it unlocks advanced capabilities for querying and analyzing data.
Free Resources