SQL Server 2012 Windowing Functions Part 1 of 2: Running and Sliding Aggregates

The first windowing capabilities appeared in SQL Server 2005 with the introduction of the OVER clause and a set of ranking functions (ROW_NUMBER, RANK, DENSE_RANK, and NTILE). The term “window,” as it is used here, refers to the scope of visibility from one row in a result set relative to neighboring rows in the same result set. By default, OVER produces a single window over the entire result set, but its associated PARTITION BY clause lets you divide the result set up into distinct windows—one per partition. Furthermore, its associated ORDER BY clause enables cumulative calculations within each window.

In addition to the four ranking functions, the OVER clause can be used with the traditional aggregate functions (SUM, COUNT, MIN, MAX, AVG). This is extremely useful, because it allows you to calculate aggregations without being forced to summarize all the detail rows with a GROUP BY clause. However, prior to SQL Server 2012, running and sliding calculations with an associated ORDER BY clause was supported only for the ranking functions. Using ORDER BY with OVER for any of the aggregate functions was not allowed. This prevents running and sliding aggregations, severely limited the windowing capability of OVER since its introduction in SQL Server 2005.

Fortunately, SQL Server 2012 finally addresses this shortcoming. In this blog post, the first in a two-part article, I’ll show you how to use OVER/ORDER BY with all the traditional aggregate functions in SQL Server 2012 to provide running aggregates within ordered windows and partitions. I’ll also show you how to frame windows using the ROWS or RANGE clause, which adjusts the size and scope of the window and enables sliding aggregations. SQL Server 2012 also introduces eight new analytic functions that are designed specifically to work with ordered (and optionally partitioned) windows using the OVER clause. I will cover those new analytic functions in Part 2.

To demonstrate running and sliding aggregates, create a table and populate it with sample financial transactions for several different accounts, as shown below. (Note the use of the DATEFROMPARTS function, also new in SQL Server 2012, which is used to construct a date value from year, month, and day parameters.)

CREATE TABLE TxnData (AcctId int, TxnDate date, Amount decimal)
GO

INSERT INTO TxnData (AcctId, TxnDate, Amount) VALUES
  (1, DATEFROMPARTS(2011, 8, 10), 500),  -- 5 transactions for acct 1
  (1, DATEFROMPARTS(2011, 8, 22), 250),
  (1, DATEFROMPARTS(2011, 8, 24), 75),
  (1, DATEFROMPARTS(2011, 8, 26), 125),
  (1, DATEFROMPARTS(2011, 8, 28), 175),
  (2, DATEFROMPARTS(2011, 8, 11), 500),  -- 8 transactions for acct 2
  (2, DATEFROMPARTS(2011, 8, 15), 50),
  (2, DATEFROMPARTS(2011, 8, 22), 5000),
  (2, DATEFROMPARTS(2011, 8, 25), 550),
  (2, DATEFROMPARTS(2011, 8, 27), 105),
  (2, DATEFROMPARTS(2011, 8, 27), 95),
  (2, DATEFROMPARTS(2011, 8, 29), 100),
  (2, DATEFROMPARTS(2011, 8, 30), 2500),
  (3, DATEFROMPARTS(2011, 8, 14), 500),  -- 4 transactions for acct 3
  (3, DATEFROMPARTS(2011, 8, 15), 600),
  (3, DATEFROMPARTS(2011, 8, 22), 25),
  (3, DATEFROMPARTS(2011, 8, 23), 125)

Running Aggregations

Used by itself, the OVER clause operates over a window that encompasses the entire result set of a query. Windows can be partitioned in your queries using OVER with PARTITION BY, enabling partition-level aggregations to be calculated for each window. And with SQL Server 2012, an ORDER BY clause can also be specified with OVER to achieve row-level running aggregations within each window. The following code demonstrates the use of OVER with ORDER BY to produce running aggregations:

SELECT AcctId, TxnDate, Amount,
  RAvg = AVG(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate),
  RCnt = COUNT(*)    OVER (PARTITION BY AcctId ORDER BY TxnDate),
  RMin = MIN(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate),
  RMax = MAX(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate),
  RSum = SUM(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate)
 FROM TxnData
 ORDER BY AcctId, TxnDate

AcctId TxnDate    Amount RAvg        RCnt RMin RMax RSum
------ ---------- ------ ----------- ---- ---- ---- ----
1      2011-08-10 500    500.000000  1    500  500  500
1      2011-08-22 250    375.000000  2    250  500  750
1      2011-08-24 75     275.000000  3    75   500  825
1      2011-08-26 125    237.500000  4    75   500  950
1      2011-08-28 175    225.000000  5    75   500  1125
2      2011-08-11 500    500.000000  1    500  500  500
2      2011-08-15 50     275.000000  2    50   500  550
2      2011-08-22 5000   1850.000000 3    50   5000 5550
 :

The results of this query are partitioned (windowed) by account. Within each window, the account’s running averages, counts, minimum/maximum values, and sums are ordered by transaction date, showing the chronologically accumulated values for each account. No ROWS clause or RANGE clause is specified, so ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is assumed by default. This yields a window frame size that spans from the beginning of the partition (the first row of each account) through the current row. When the account number changes, the previous window is “closed” and new calculations start running for a new window over the next account number.

Sliding Aggregations

You can narrow each account’s window by framing it with a ROWS BETWEEN n PRECEDING AND CURRENT ROW clause within the OVER clause. This enables sliding calculations, as demonstrated by this slightly modified version of the previous query:

SELECT AcctId, TxnDate, Amount,
  SAvg = AVG(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),
  SCnt = COUNT(*)    OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS 2 PRECEDING),
  SMin = MIN(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS 2 PRECEDING),
  SMax = MAX(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS 2 PRECEDING),
  SSum = SUM(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS 2 PRECEDING)
 FROM TxnData
 ORDER BY AcctId, TxnDate

AcctId TxnDate    Amount SAvg        SCnt SMin SMax SSum
------ ---------- ------ ----------- ---- ---- ---- ----
1      2011-08-10 500    500.000000  1    500  500  500
1      2011-08-22 250    375.000000  2    250  500  750
1      2011-08-24 75     275.000000  3    75   500  825
1      2011-08-26 125    150.000000  3    75   250  450
1      2011-08-28 175    125.000000  3    75   175  375
2      2011-08-11 500    500.000000  1    500  500  500
2      2011-08-15 50     275.000000  2    50   500  550
2      2011-08-22 5000   1850.000000 3    50   5000 5550
 :

This query specifies ROWS BETWEEN 2 PRECEDING AND CURRENT ROW in the OVER clause for the RAvg column, overriding the default window size. Specifically, it frames the window within each account’s partition to a maximum of three rows: the current row, the row before it, and one more row before that one. Once the window expands to three rows, it stops growing and starts sliding down the subsequent rows until a new partition (the next account) is encountered. The BETWEEN…AND CURRENT ROW keywords that specify the upper bound of the window are assumed default, so to reduce code clutter, the other column definitions specify just the lower bound of the window with the shorter variation ROWS 2 PRECEDING.

Notice how the window “slides” within each account. For example, the sliding maximum for account 1 drops from 500 to 250 in the fourth row, because 250 is the largest value in the window of three rows that begins two rows earlier—and the 500 from the very first row is no longer visible in that window. Similarly, the sliding sum for each account is based on the defined window. Thus, the sliding sum of 375 on the last row of account 1 is the total sum of that row (175) plus the two preceding rows (75 + 125) only—not the total sum for all transactions in the entire account, as the running sum had calculated.

Using RANGE versus ROWS

Finally, RANGE can be used instead of ROWS to handle “ties” within a window. While ROWS treats each row in the window distinctly, RANGE will merge rows containing duplicate ORDER BY values, as demonstrated by the following query:

SELECT AcctId, TxnDate, Amount,
 SumByRows = SUM(Amount) OVER (ORDER BY TxnDate ROWS UNBOUNDED PRECEDING),
 SumByRange = SUM(Amount) OVER (ORDER BY TxnDate RANGE UNBOUNDED PRECEDING)
 FROM TxnData
 WHERE AcctId = 2
 ORDER BY TxnDate

AcctId TxnDate    Amount SumByRows SumByRange
------ ---------- ------ --------- ----------
2      2011-08-11 500    500       500
2      2011-08-15 50     550       550
2      2011-08-22 5000   5550      5550
2      2011-08-25 550    6100      6100
2      2011-08-27 105    6205      6300
2      2011-08-27 95     6300      6300
2      2011-08-29 100    6400      6400
2      2011-08-30 2500   8900      8900

In this result set, ROWS and RANGE both return the same values, with the exception of the fifth row. Because the fifth and sixth rows are both tied for the same date (8/27/2011), RANGE returns the combined running sum for both rows. The seventh row (for 8/29/2011) breaks the tie, and ROWS “catches up” with RANGE to return running totals for the rest of the window.

Conclusion

Windowing functions using the OVER clause have been greatly enhanced in SQL Server 2012. In addition to the 4 ranking functions, running and sliding calculations with OVER/ORDER BY is now supported for all the traditional aggregate functions as well. SQL Server 2012 also introduces eight new analytic functions that are designed specifically to work with ordered (and optionally partitioned) windows using the OVER clause. Stay tuned for Part 2, which will show you how to use these new analytic windowing functions.

Download VSLive Orlando SQL Server 2012 Workshop Materials

I just got back from another great VSLive! A special thanks to all the attendees who were at my SQL Server workshop on Monday. You guys had great questions and were a lot of fun. As promised, I’ve posted the very latest version of my slides and code for you to grab right here:

http://bit.ly/VSL2011SQLOrlando

If you missed VSLive! in Orlando, catch us in Vegas next March. In addition to the full-day SQL 2012 workshop, I’ll be giving breakout sessions on SQL Server Data Tools (SSDT) and .NET 4.0 Data Access. Enjoy!