Aggregate functions vs. window functions
In SQL, aggregate and window functions serve distinct purposes, catering to different analytical needs. Let’s look into the definitions and differences of these powerful tools, accompanied by concise yet comprehensive examples.
Aggregate functions
Aggregate functions operate on a set of values and return a single value summarizing that set. They often provide insights into the overall characteristics of a dataset. Common examples include:
COUNT()AVG()SUM()MIN()MAX().
Imagine we have the following dummy data for an orders table:
-- Creating the 'orders' table with dummy dataCREATE TABLE orders (order_id INT,order_category VARCHAR(50),order_amount DECIMAL(10, 2),order_date DATE);INSERT INTO orders (order_id, order_category, order_amount, order_date)VALUES(1, 'Electronics', 1200.00, '2023-01-01'),(2, 'Clothing', 450.50, '2023-01-02'),(3, 'Electronics', 800.00, '2023-01-03'),(4, 'Clothing', 600.75, '2023-01-04'),(5, 'Books', 300.25, '2023-01-05'),(6, 'Books', 250.50, '2023-01-06');
Example 1: Calculating the total number of orders
SELECT COUNT(order_id) AS total_ordersFROM orders;
Lines 1–2: We use the
COUNT()aggregate function to count the number oforder_identries in theorderstable, labeling the result astotal_orders.
Example 2: Calculating the average order amount
SELECT AVG(order_amount) AS avg_order_amountFROM orders;
Lines 1–2: We use the
AVG()aggregate function to calculate the averageorder_amountin theorderstable, labeling the result asavg_order_amount.
Example 3: Calculating the total order amount
SELECT SUM(order_amount) AS total_order_amountFROM orders;
Lines 1–2: We use the
SUM()aggregate function to calculate the totalorder_amountin theorderstable, labeling the result astotal_order_amount.
Example 4: Finding the minimum order amount
SELECT MIN(order_amount) AS min_order_amountFROM orders;
Lines 1–2: Here, we use the
MIN()aggregate function to find the smallestorder_amountin theorderstable, labeling the result asmin_order_amount.
Example 5: Finding the maximum order amount
SELECT MAX(order_amount) AS max_order_amountFROM orders;
Lines 1≠2: Similarly, we use the
MAX()aggregate function to find the largestorder_amountin theorderstable, labeling the result asmax_order_amount.
Aggregate functions are particularly useful to derive insights that provide a high-level dataset overview, such as counts, averages, or other summary statistics.
Window functions
In contrast, window functions work with a specific set of rows that are related to the current row in the query result set. They allow for more granular calculations, introducing the concept of a “window” or a subset of the data for computation. Common window functions include:
ROW_NUMBER()RANK()LEAD()LAG()SUM()AVG()
Example 1: Assigning a unique row number to each order in a specific category
SELECT order_id, order_category,ROW_NUMBER() OVER (PARTITION BY order_category ORDER BY order_date)AS category_row_numberFROM orders;
Lines 1–4: We use the
ROW_NUMBER()window function to assign a unique row number within eachorder_categorypartition, ordered byorder_date.
Example 2: Ranking orders based on order amount
SELECT order_id, order_amount, RANK() OVER (ORDER BY order_amount) AS order_rankFROM orders;
Lines 1–2: Here, we use the
RANK()window function to assign a rank to each order based on theorder_amount, ordered in ascending order.
Example 3: Showing the previous order amount for each order
SELECT order_id, order_amount, LAG(order_amount) OVER (ORDER BY order_date) AS previous_order_amountFROM orders;
Lines 1–2: This query utilizes the
LAG()window function to display theorder_amountfrom the previous order, ordered byorder_date.
Example 4: Showing the next order amount for each order
SELECT order_id, order_amount, LEAD(order_amount) OVER (ORDER BY order_date) AS next_order_amountFROM orders;
Lines 1–2: Similarly, we use the
LEAD()window function to display theorder_amountfrom the next order, ordered byorder_date.
Example 5: Calculating the running total of order amounts
SELECT order_id, order_amount, SUM(order_amount) OVER (ORDER BY order_date) AS running_totalFROM orders;
Lines 1–2: We use the
SUM()window function to calculate the running total of order amounts, ordered byorder_date.
Example 6: Calculating the average order amount considering previous orders
SELECT order_id, order_amount, AVG(order_amount) OVER (ORDER BY order_date) AS avg_order_amountFROM orders;
Lines 1–2: In this example, we use the
AVG()window function to calculate the averageorder_amountup to the current order, ordered byorder_date.
Window functions shine when you need to perform calculations over a specific range or partition of the dataset, such as computing running totals, ranking, or identifying distinct partitions within the data.
Comparison between the two
We use aggregate functions when we need a quick overview or summary of our data, such as counting occurrences or computing averages. We utilize window functions when we require detailed computations within specific subsets or partitions of our data, like calculating running totals or assigning row numbers within groups. Below is a feature by feature comparison of both function types:
Feature | Aggregate Functions | Window Functions |
Operating Principle | Summarize values across the entire dataset. | Operate on a specific range of rows related to the current row. |
Scope | Provide summary statistics for the entire dataset. | Enable calculations within specific subsets or partitions of data. |
Example |
|
|
Common Functions |
|
|
Usage | Best for providing high-level summaries. | Ideal for detailed calculations within specific partitions or windows of data. |
Both aggregate and window functions have SUM() and AVG().
In aggregate functions, they compute totals and averages across the entire dataset.
In window functions, they enable running totals and moving averages within specific windows of data.
Conclusion
In summary, while aggregate functions provide overarching insights into the entire dataset, window functions offer a more nuanced and granular approach, allowing for detailed computations over specific windows or partitions. Knowing when to leverage each is crucial for constructing effective SQL queries tailored to your analytical requirements.
Free Resources