What are analytic window functions in SQL?
A window function lets you perform calculations across any number of table rows that are linked to the current row. Window functions in SQL can be broadly categorized into the following three categories:
Aggregate window functions
Ranking window functions
Analytic window functions
In SQL, analytic window functions assign values to each row in the result set based on the values in the other rows within the same partition. The result set is the output table generated by a query. Analytic window functions, also known as value window functions, include LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE(), and NTH_VALUE().
Let's explore window functions in more detail. In SQL, a window refers to the set of rows on which the window function operates.
Within a window, a window frame specifies the range of rows to consider for our calculations. Partitions are used to group rows together before applying the window function. A partition can be defined using the PARTITION BY clause.
The window functions can be predefined or user-defined.
Syntax of window functions
Following is the syntax for a window function:
window_function ()OVER ( [PARTITION BY clause][ORDER BY clause][window_frame clause])
The
window_function()represents the window function. We replace it with the actual window function to perform the operation.We create partitions by defining the column/condition in
PARTITION BY clause. This clause becomes optional if there’s no need to create partitions. The window function will treat the dataset as a single window in such cases.The
ORDER BY clausespecifies the order within each partition.The
window_frame clauseis used to define the range of rows considered for the calculations.
Note: The syntax for window functions is consistent across all types, including aggregate, analytic, and ranking window functions.
Understanding analytic window functions
This section explores the core concepts and functionality of analytic window functions in SQL, providing insights into how these functions assign values and help in data analysis.
Example database
We have some data on the student’s exam results in the Exam_result table that we can view by using the SELECT query.
SELECT * FROM Exam_result;
The LAG() function
The LAG() function assigns the value from the previous row within a result set until it reaches the beginning of the dataset. There are two optional parameters available with the LAG() function:
OFFSET: Specifies how many rows back to look from the current row. The default value for this parameter is1.DEFAULT: Provides a value to use when theOFFSETgoes beyond the available rows. The default value for this parameter is0.
The syntax of LAG() function is as follows:
SELECT column1,column2,LAG(column_name, OFFSET, DEFAULT)OVER ([PARTITION BY clause][ORDER BY clause][window_frame clause])FROM table_name;
An example of LAG() function is given below. The LAG() function retrieves the marks of the previous student from the result set and assigns them to the current student:
SELECT std_name,marks,subject_name,LAG(marks, 1, 0::NUMERIC)OVER (ORDER BY marks DESC) AS LAG_defaultFROM Exam_result;
Note: In the query, we have used only the records for the students of
Mathematicssubject.
Lines 1–4: We use the
SELECTstatement to read specific columns from theExam_resulttable, includingstd_name,marks, andsubject_name.Lines 5–7: The
LAG()window function is used to assign the previous value to each row of themarkscolumn. In theLAG()function, we have used1asOFFSETand0asDEFAULT. We set the alias of the new column asLAG_default. The function is applied to each row within the partition, and the result will be presented in theLAG_defaultcolumn in descending order as written inORDER BYclause.Line 8: The
FROM clausespecifies the source tableExam_resultfrom which the data is being queried.
The LEAD() function
The LEAD() function assigns the value from the next row within a result set until it reaches the end of the dataset. Similar to LAG() function, there are two optional parameters available with the LEAD() function:
OFFSET: Specifies how many rows ahead to look from the current row. The default value for this parameter is1.DEFAULT: Provides a value to use when theOFFSETgoes beyond the available rows. The default value for this parameter is0.
The syntax of LEAD() function is as follows:
SELECT column1,column2,LEAD(column_name, OFFSET, DEFAULT)OVER ([PARTITION BY clause][ORDER BY clause][window_frame clause])FROM table_name;
An example of LEAD() function is given below. The LEAD() function retrieves the marks of the next student from the result set and assigns them to the current student:
SELECT std_name,marks,subject_name,LEAD(marks, 1, 0::NUMERIC)OVER (ORDER BY marks DESC) AS LEAD_defaultFROM Exam_result;
Note: In the query, we have used only the records for the students of
Mathematicssubject.
Lines 1–4: We use the
SELECTstatement to read specific columns from theExam_resulttable, includingstd_name,marks, andsubject_name.Lines 5–7: The
LEAD()window function is used to assign the next row value to each row of themarkscolumn. In theLEAD()function, we have used1asOFFSETand0asDEFAULT. We set the alias of the new column asLEAD_default. The function is applied to each row within the partition, and the result will be presented in theLEAD_defaultcolumn in descending order as written inORDER BYclause.Line 8: The
FROM clausespecifies the source tableExam_resultfrom which the data is being queried.
Note: The
LAG()andLEAD()functions are window functions that allow us to access data from previous and subsequent rows within a specified window, respectively.
The FIRST_VALUE() function
The window function FIRST_VALUE() returns the first value in the dataset. It returns the value from the first row in the window frame based on the specified order.
Following is the syntax for FIRST_VALUE():
FIRST_VALUE ( [expression] )OVER ( [PARTITION BY clause][ORDER BY clause][window_frame clause])
Following is an example of FIRST_VALUE() to retrieve the lowest marks for each subject:
SELECT std_id,std_name,marks,subject_name,FIRST_VALUE(marks)OVER(PARTITION BY subject_nameORDER BY marks ASC)FROM Exam_result;
Lines 1–4: We use the
SELECTstatement to read specific columns from theExam_resulttable, includingstd_id,std_name,marks, andsubject_name.Lines 5–9: The
FIRST_VALUE()window function is used to calculate the first value of themarkscolumn in each partition defined by thesubject_namecolumn. The function is applied to each row within the partition, and the result will be presented in theFIRST_VALUE(marks)column in ascending order as written inORDER BYclause.Line 10: The
FROM clausespecifies the source tableExam_resultfrom which the data is being queried.
The LAST_VALUE() function
The window function LAST_VALUE() returns the last value in the dataset. It returns the value in the last row in the window frame based on the specified order.
Following is the syntax for LAST_VALUE():
LAST_VALUE ( [expression] )OVER ( [PARTITION BY clause][ORDER BY clause][window_frame clause])
Following is an example of LAST_VALUE() to retrieve the highest marks for each subject:
SELECT std_id,std_name,marks,subject_name,LAST_VALUE(marks)OVER(PARTITION BY subject_nameORDER BY marks ASCROWS BETWEENUNBOUNDED PRECEDING ANDUNBOUNDED FOLLOWING)FROM Exam_result;
Lines 1–4: We use the
SELECTstatement to read specific columns from theExam_resulttable, includingstd_id,std_name,marks, andsubject_name.Lines 5–12: The
LAST_VALUE()window function is used to calculate the last value of themarkscolumn in each partition defined by thesubject_namecolumn. The function is applied to each row within the partition, and the result is stored in theLAST_VALUE(marks)column. The output will be presented in ascending order ofmarkscolumn. TheROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGclause specifies that the window frame for each row in the partition includes all rows from the first row to the last row of the partition, without regard to the actual values of the rows but based on their order. This means that for each row, the function considers all rows in the partition for calculating the result.Lines 13: The
FROM clausespecifies the source tableExam_resultfrom which the data is being queried.
The NTH_VALUE() function
The window function NTH_VALUE() returns the Nth value in the dataset. It returns the value in the nth row in the window frame based on the specified order.
Following is the syntax for NTH_VALUE():
NTH_VALUE ( [expression] )OVER ( [PARTITION BY clause][ORDER BY clause][window_frame clause])
Following is an example of NTH_VALUE():
SELECT std_id,std_name,marks,subject_name,NTH_VALUE(marks, 2)OVER(PARTITION BY subject_nameORDER BY marks ASCROWS BETWEENUNBOUNDED PRECEDING ANDUNBOUNDED FOLLOWING)FROM Exam_result;
Lines 1–4: We use the
SELECTstatement to read specific columns from theExam_resulttable, includingstd_id,std_name,marks, andsubject_name.Lines 5–12: The
NTH_VALUE()window function is used to calculate the value of themarkscolumn in each partition defined by thesubject_namecolumn. The function is applied to each row within the partition, and the result will be presented in theNTH_VALUE(marks)column. The output will be presented in descending order ofmarkscolumn. TheROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING clausespecifies that the window frame for each row in the partition includes all rows from the first row to the last row of the partition. The output changes the value ofNULLrecord. You may try removing this clause to see how the output looks without this clause.Lines 13: The
FROM clausespecifies the source tableExam_resultfrom which the data is being queried.
Conclusion
In conclusion, analytic window functions in SQL offer powerful tools for analyzing and processing data. These functions provide various ways to find specific values in a result set. We saw each function with the coding examples. We learned how these functions work within partitions and how understanding and utilizing window functions can greatly enhance our data analysis capabilities in SQL.
Free Resources