The first windowing capabilities appeared in SQL Server 2005 with the introduction of the OVER clause and a set of ranking functions (ROW_NUMBER, RANK, DENSE_RANK, and NTILE). The term “window,” as it is used here, refers to the scope of visibility from one row in a result set relative to neighboring rows in the same result set. By default, OVER produces a single window over the entire result set, but its associated PARTITION BY clause lets you divide the result set up into distinct windows—one per partition. Furthermore, its associated ORDER BY clause enables cumulative calculations within each window.
In addition to the four ranking functions, the OVER clause can be used with the traditional aggregate functions (SUM, COUNT, MIN, MAX, AVG). This is extremely useful, because it allows you to calculate aggregations without being forced to summarize all the detail rows with a GROUP BY clause. However, prior to SQL Server 2012, running and sliding calculations with an associated ORDER BY clause was supported only for the ranking functions. Using ORDER BY with OVER for any of the aggregate functions was not allowed. This prevents running and sliding aggregations, severely limited the windowing capability of OVER since its introduction in SQL Server 2005.
Fortunately, SQL Server 2012 finally addresses this shortcoming. In this blog post, the first in a two-part article, I’ll show you how to use OVER/ORDER BY with all the traditional aggregate functions in SQL Server 2012 to provide running aggregates within ordered windows and partitions. I’ll also show you how to frame windows using the ROWS or RANGE clause, which adjusts the size and scope of the window and enables sliding aggregations. SQL Server 2012 also introduces eight new analytic functions that are designed specifically to work with ordered (and optionally partitioned) windows using the OVER clause. I will cover those new analytic functions in Part 2.
To demonstrate running and sliding aggregates, create a table and populate it with sample financial transactions for several different accounts, as shown below. (Note the use of the DATEFROMPARTS function, also new in SQL Server 2012, which is used to construct a date value from year, month, and day parameters.)
CREATE TABLE TxnData (AcctId int, TxnDate date, Amount decimal) GO INSERT INTO TxnData (AcctId, TxnDate, Amount) VALUES (1, DATEFROMPARTS(2011, 8, 10), 500), -- 5 transactions for acct 1 (1, DATEFROMPARTS(2011, 8, 22), 250), (1, DATEFROMPARTS(2011, 8, 24), 75), (1, DATEFROMPARTS(2011, 8, 26), 125), (1, DATEFROMPARTS(2011, 8, 28), 175), (2, DATEFROMPARTS(2011, 8, 11), 500), -- 8 transactions for acct 2 (2, DATEFROMPARTS(2011, 8, 15), 50), (2, DATEFROMPARTS(2011, 8, 22), 5000), (2, DATEFROMPARTS(2011, 8, 25), 550), (2, DATEFROMPARTS(2011, 8, 27), 105), (2, DATEFROMPARTS(2011, 8, 27), 95), (2, DATEFROMPARTS(2011, 8, 29), 100), (2, DATEFROMPARTS(2011, 8, 30), 2500), (3, DATEFROMPARTS(2011, 8, 14), 500), -- 4 transactions for acct 3 (3, DATEFROMPARTS(2011, 8, 15), 600), (3, DATEFROMPARTS(2011, 8, 22), 25), (3, DATEFROMPARTS(2011, 8, 23), 125)
Running Aggregations
Used by itself, the OVER clause operates over a window that encompasses the entire result set of a query. Windows can be partitioned in your queries using OVER with PARTITION BY, enabling partition-level aggregations to be calculated for each window. And with SQL Server 2012, an ORDER BY clause can also be specified with OVER to achieve row-level running aggregations within each window. The following code demonstrates the use of OVER with ORDER BY to produce running aggregations:
SELECT AcctId, TxnDate, Amount, RAvg = AVG(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate), RCnt = COUNT(*) OVER (PARTITION BY AcctId ORDER BY TxnDate), RMin = MIN(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate), RMax = MAX(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate), RSum = SUM(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate) FROM TxnData ORDER BY AcctId, TxnDate AcctId TxnDate Amount RAvg RCnt RMin RMax RSum ------ ---------- ------ ----------- ---- ---- ---- ---- 1 2011-08-10 500 500.000000 1 500 500 500 1 2011-08-22 250 375.000000 2 250 500 750 1 2011-08-24 75 275.000000 3 75 500 825 1 2011-08-26 125 237.500000 4 75 500 950 1 2011-08-28 175 225.000000 5 75 500 1125 2 2011-08-11 500 500.000000 1 500 500 500 2 2011-08-15 50 275.000000 2 50 500 550 2 2011-08-22 5000 1850.000000 3 50 5000 5550 :
The results of this query are partitioned (windowed) by account. Within each window, the account’s running averages, counts, minimum/maximum values, and sums are ordered by transaction date, showing the chronologically accumulated values for each account. No ROWS clause or RANGE clause is specified, so ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is assumed by default. This yields a window frame size that spans from the beginning of the partition (the first row of each account) through the current row. When the account number changes, the previous window is “closed” and new calculations start running for a new window over the next account number.
Sliding Aggregations
You can narrow each account’s window by framing it with a ROWS BETWEEN n PRECEDING AND CURRENT ROW clause within the OVER clause. This enables sliding calculations, as demonstrated by this slightly modified version of the previous query:
SELECT AcctId, TxnDate, Amount, SAvg = AVG(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS BETWEEN 2 PRECEDING AND CURRENT ROW), SCnt = COUNT(*) OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS 2 PRECEDING), SMin = MIN(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS 2 PRECEDING), SMax = MAX(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS 2 PRECEDING), SSum = SUM(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS 2 PRECEDING) FROM TxnData ORDER BY AcctId, TxnDate AcctId TxnDate Amount SAvg SCnt SMin SMax SSum ------ ---------- ------ ----------- ---- ---- ---- ---- 1 2011-08-10 500 500.000000 1 500 500 500 1 2011-08-22 250 375.000000 2 250 500 750 1 2011-08-24 75 275.000000 3 75 500 825 1 2011-08-26 125 150.000000 3 75 250 450 1 2011-08-28 175 125.000000 3 75 175 375 2 2011-08-11 500 500.000000 1 500 500 500 2 2011-08-15 50 275.000000 2 50 500 550 2 2011-08-22 5000 1850.000000 3 50 5000 5550 :
This query specifies ROWS BETWEEN 2 PRECEDING AND CURRENT ROW in the OVER clause for the RAvg column, overriding the default window size. Specifically, it frames the window within each account’s partition to a maximum of three rows: the current row, the row before it, and one more row before that one. Once the window expands to three rows, it stops growing and starts sliding down the subsequent rows until a new partition (the next account) is encountered. The BETWEEN…AND CURRENT ROW keywords that specify the upper bound of the window are assumed default, so to reduce code clutter, the other column definitions specify just the lower bound of the window with the shorter variation ROWS 2 PRECEDING.
Notice how the window “slides” within each account. For example, the sliding maximum for account 1 drops from 500 to 250 in the fourth row, because 250 is the largest value in the window of three rows that begins two rows earlier—and the 500 from the very first row is no longer visible in that window. Similarly, the sliding sum for each account is based on the defined window. Thus, the sliding sum of 375 on the last row of account 1 is the total sum of that row (175) plus the two preceding rows (75 + 125) only—not the total sum for all transactions in the entire account, as the running sum had calculated.
Using RANGE versus ROWS
Finally, RANGE can be used instead of ROWS to handle “ties” within a window. While ROWS treats each row in the window distinctly, RANGE will merge rows containing duplicate ORDER BY values, as demonstrated by the following query:
SELECT AcctId, TxnDate, Amount, SumByRows = SUM(Amount) OVER (ORDER BY TxnDate ROWS UNBOUNDED PRECEDING), SumByRange = SUM(Amount) OVER (ORDER BY TxnDate RANGE UNBOUNDED PRECEDING) FROM TxnData WHERE AcctId = 2 ORDER BY TxnDate AcctId TxnDate Amount SumByRows SumByRange ------ ---------- ------ --------- ---------- 2 2011-08-11 500 500 500 2 2011-08-15 50 550 550 2 2011-08-22 5000 5550 5550 2 2011-08-25 550 6100 6100 2 2011-08-27 105 6205 6300 2 2011-08-27 95 6300 6300 2 2011-08-29 100 6400 6400 2 2011-08-30 2500 8900 8900
In this result set, ROWS and RANGE both return the same values, with the exception of the fifth row. Because the fifth and sixth rows are both tied for the same date (8/27/2011), RANGE returns the combined running sum for both rows. The seventh row (for 8/29/2011) breaks the tie, and ROWS “catches up” with RANGE to return running totals for the rest of the window.
Conclusion
Windowing functions using the OVER clause have been greatly enhanced in SQL Server 2012. In addition to the 4 ranking functions, running and sliding calculations with OVER/ORDER BY is now supported for all the traditional aggregate functions as well. SQL Server 2012 also introduces eight new analytic functions that are designed specifically to work with ordered (and optionally partitioned) windows using the OVER clause. Stay tuned for Part 2, which will show you how to use these new analytic windowing functions.