New Metadata Discovery Features in SQL Server 2012

It has always been possible to interrogate SQL Server for metadata (schema) information. You can easily discover all the objects in a database (tables, views, stored procedures, and so on) and their types by directly querying system tables (not recommended, as they can change from one version of SQL Server to another) or information schema views (which are consistent in each SQL Server version). It is significantly more challenging, however, to discover the result set schema for T-SQL statements or stored procedures that contain conditional logic. Using SET FMTONLY ON/OFF has been the common technique in the past for discovering the schema of a query’s result set without actually executing the query itself. For example, consider the following code:

USE AdventureWorks2012
GO

SET FMTONLY ON
SELECT * FROM HumanResources.Employee;
SET FMTONLY OFF

This SELECT statement, which would normally return all the rows from the HumanResources.Employee table, returns no rows at all. It just reveals the columns. The SET FMTONLY ON statement prevents queries from returning rows of data so that their schemas can be discovered, and this behavior remains in effect until SET FMTONLY OFF is encountered. SQL Server 2012 introduces several new system stored procedures and table-valued functions (TVFs) that provide significantly richer metadata discovery than what can be discerned using the relatively inelegant (and now deprecated) SET FMTONLY ON/OFF approach. These new procedures and functions are:

  • sys.sp_describe_first_result_set
  • sys.dm_exec_describe_first_result_set
  • sys.dm_exec_describe_first_result_set_for_object
  • sys.sp_describe_undeclared_parameters

In this blog post, I’ll explain how to use these new objects to discover schema information in SQL Server 2012.

sys.sp_describe_first_result_set

The sys.sp_describe_first_result_set stored procedure accepts a T-SQL statement and produces a highly detailed schema description of the first possible result set returned by that statement. The following code retrieves schema information for the same SELECT statement you used earlier to get information on all the columns in the HumanResources.Employee table:

EXEC sp_describe_first_result_set
 @tsql = N'SELECT * FROM HumanResources.Employee'

The following screenshot shows the wealth of information that SQL Server returns about each column in the result set returned by the sp_describe_first_result_set call:

sys.dm_exec_describe_first_result_set

There is also a data management function named sys.dm_exec_describe_first_result_set that works very similar to sys.sp_describe_first_result_set. But because it is implemented as a table-valued function (TVF), it is easy to query against it and limit the metadata returned. For example, the following query examines the same T-SQL statement, but returns just the name and data type of nullable columns:

SELECT name, system_type_name
 FROM sys.dm_exec_describe_first_result_set(
  'SELECT * FROM HumanResources.Employee', NULL, 1)
 WHERE is_nullable = 1

Here is the output:

name               system_type_name
-----------------  ----------------
OrganizationNode   hierarchyid
OrganizationLevel  smallint

Parameterized queries are also supported, if you supply an appropriate parameter signature after the T-SQL. The T-SQL in the previous example had no parameters, so it passed NULL for the “parameters parameter.” The following example discovers the schema of a parameterized query.

SELECT name, system_type_name, is_hidden
 FROM sys.dm_exec_describe_first_result_set('
  SELECT OrderDate, TotalDue
   FROM Sales.SalesOrderHeader
   WHERE SalesOrderID = @OrderID',
  '@OrderID int', 1)

Here is the output:

name             system_type_name  is_hidden
---------------  ----------------  ---------
OrderDate        datetime          0
TotalDue         money             0
SalesOrderID     int               1

You’d be quick to question why the SalesOrderID column is returned for a SELECT statement that returns only OrderDate and TotalDue. The answer lies in the last parameter passed to the data management function. A bit value of 1 (for true) tells SQL Server to return the identifying SalesOrderID column, because it is used to “browse” the result set. Notice that it is marked true (1) for is_hidden. This informs the client that the SalesOrderID column is not actually revealed by the query, but can be used to uniquely identify each row in the query’s result set.

What if multiple result sets are possible? There’s no problem with this as long as they all have the same schema. In fact, SQL Server will even try to forgive cases where multiple possible schemas are not exactly identical. For example, if the same column is nullable in one result set and non-nullable in the other, schema discovery will succeed and indicate the column as nullable. It will even tolerate cases where the same column has a different name (but same type) between two possible result sets, and indicate NULL for the column name, rather than arbitrarily choosing one of the possible column names or failing altogether.

The following code demonstrates this with a T-SQL statement that has two possible result sets depending on the value passed in for the @SortOrder parameter. Because both result sets have compatible schemas, the data management function succeeds in returning schema information.

SELECT name, system_type_name
 FROM sys.dm_exec_describe_first_result_set('
    IF @SortOrder = 1
      SELECT OrderDate, TotalDue
       FROM Sales.SalesOrderHeader
       ORDER BY SalesOrderID ASC
    ELSE IF @SortOrder = -1
      SELECT OrderDate, TotalDue
       FROM Sales.SalesOrderHeader
       ORDER BY SalesOrderID DESC',
   '@SortOrder AS tinyint', 0) 

Here is the output:

name         system_type_name
-----------  ----------------
OrderDate    datetime
TotalDue     money

Discovery won’t succeed if SQL Server detects incompatible schemas. In this next example, the call to the system stored procedure specifies a T-SQL statement with two possible result sets, but one returns three columns while the other returns only two columns.

EXEC sys.sp_describe_first_result_set
  @tsql = N'
    IF @IncludeCurrencyRate = 1
      SELECT OrderDate, TotalDue, CurrencyRateID
       FROM Sales.SalesOrderHeader
    ELSE
      SELECT OrderDate, TotalDue
       FROM Sales.SalesOrderHeader'

In this case, the system stored procedure raises an error that clearly explains the problem:

Msg 11509, Level 16, State 1, Procedure sp_describe_first_result_set, Line 53

The metadata could not be determined because the statement 'SELECT OrderDate, TotalDue, CurrencyRateID FROM Sales.SalesOrderHeader' is not compatible with the statement 'SELECT OrderDate, TotalDue FROM Sales.SalesOrderHeader'.

It is noteworthy to mention that the data management function copes with this scenario much more passively. Given conflicting result set schemas, it simply returns NULL and does not raise an error.

sys.dm_exec_describe_first_result_set_for_object

The data management function sys.dm_exec_describe_first_result_set_for_object can be used to achieve the same discovery against any object in the database. It accepts just an object ID and the Boolean “browse” flag to specify if hidden ID columns should be returned. You can use the OBJECT_ID function to obtain the ID of the desired object. The following code demonstrates this by returning schema information for the stored procedure GetOrderInfo.

CREATE PROCEDURE GetOrderInfo(@OrderID AS int) AS
  SELECT OrderDate, TotalDue
   FROM Sales.SalesOrderHeader
   WHERE SalesOrderID = @OrderID
GO

SELECT name, system_type_name, is_hidden
 FROM sys.dm_exec_describe_first_result_set_for_object(OBJECT_ID('GetOrderInfo'), 1)

Here is the output:

name             system_type_name   is_hidden
---------------  -----------------  ---------
OrderDate        datetime           0
TotalDue         money              0
SalesOrderID     int                1

sys.sp_describe_undeclared_parameters

Finally, the sys.sp_describe_undeclared_parameters stored procedure parses a T-SQL statement to discover type information about the parameters expected by the statement, as the following code demonstrates:

EXEC sys.sp_describe_undeclared_parameters
 N'IF @IsFlag = 1 SELECT 1 ELSE SELECT 0'

Here is the output:

parameter_ordinal name    suggested_system_type_id suggested_system_type_name ...
----------------- ------- ------------------------ -------------------------- -------
1                 @IsFlag 56                       int                        ... 

In this example, SQL Server detects the @IsFlag parameter, and suggests the int data type based on the usage in the T-SQL statement it was given to parse.

Enhance Portability with Partially Contained Databases in SQL Server 2012

The dependency of database-specific users upon server-based logins poses a challenge when you need to move or restore a database to another server. Although the users move with the database, their associated logins do not, and thus the relocated database will not function properly until you also setup and map the necessary logins on the target server. To resolve these types of dependency problems and help make databases more easily portable, SQL Server 2012 introduces “partially contained” databases.

The term “partially contained” is based on the fact that SQL Server itself merely enables containment—it does not enforce it. It’s still your job to actually implement true containment. From a security perspective, this means that partially contained databases allow you to create a special type of user called a contained user. The contained user’s password is stored right inside the contained database, rather than being associated with a login defined at the server instance level and stored in the master database. Then, unlike the standard SQL Server authentication model, contained users are authenticated directly against the credentials in the contained database without ever authenticating against the server instance. Naturally, for this to work, a connection string with a contained user’s credentials must include the Initial Catalog keyword that specifies the contained database name.

Creating a Partially Contained Database

To create a partially contained database, first enable the contained database authentication setting by calling sp_configure and then issue a CREATE DATABASE statement with the new CONTAINMENT=PARTIAL clause as the following code demonstrates.

-- Enable database containment

USE master
GO

EXEC sp_configure 'contained database authentication', 1
RECONFIGURE

-- Delete database if it already exists
IF EXISTS(SELECT name FROM sys.databases WHERE name = 'MyDB')
 DROP DATABASE MyDB
GO

-- Create a partially contained database
CREATE DATABASE MyDB CONTAINMENT = PARTIAL
GO

USE MyDB
GO

To reiterate, SQL Server doesn’t enforce containment. You can still break containment by creating ordinary database users for server-based logins. For this reason, it’s easy to convert an ordinary (uncontained) database to a partially contained database; simply issue an ALTER DATABASE statement and specify SET CONTAINMENT=PARTIAL. You’ll then be able to migrate the server-based logins to contained logins and achieve server independence.

Creating a Contained User

Once you have a contained database, you can create a contained user for it by issuing a CREATE USER statement and specifying WITH PASSWORD, as shown here:

CREATE USER UserWithPw
 WITH PASSWORD = N'password$1234'

This syntax is valid only for contained databases; SQL Server will raise an error if you attempt to create a contained user in the context of an uncontained database.

That’s all there is to creating partially contained databases with contained users. The only remaining point that’s worth calling out is that an Initial Catalog clause pointing to a partially contained database must be specified explicitly in a connection string that also specifies the credentials of a contained user in that database. If just the credentials are specified without the database, SQL Server will not scan the partially contained databases hosted on the instance for one that has a user with matching credentials. Instead, it will consider the credentials to be those of an ordinary SQL Server login, and will not authenticate against the contained database.

Other Partially Contained Database Features

Aside from server-based logins, there are many other dependencies that a database might have on its hosted instance. These include linked servers, SQL CLR, database mail, service broker objects, endpoints, replication, SQL Server Agent jobs, and tempdb collation. All these objects are considered to be uncontained entities since they all exist outside the database.

Uncontained entities threaten a database’s portability. Since these objects are all defined at the server instance level, behavior can vary unpredictably when databases are shuffled around from one instance to another. Let’s examine features to help you achieve the level of containment and stability that your circumstances require.

Uncontained Entities View

SQL Server provides a new data management view (DMV) called sys.dm_db_uncontained_entities that you can query on to discover potential threats to database portability. This DMV not only highlights dependent objects, it will even report the exact location of all uncontained entity references inside of stored procedures, views, functions, and triggers.

The following code creates a few stored procedures, and then joins sys.dm_db_uncontained_entities with sys.objects to report the name of all objects having uncontained entity references in them.

-- Create a procedure that references a database-level object
CREATE PROCEDURE GetTables AS
BEGIN
  SELECT * FROM sys.tables
END
GO

-- Create a procedure that references an instance-level object
CREATE PROCEDURE GetEndpoints AS
BEGIN
  SELECT * FROM sys.endpoints
END
GO

-- Identify objects that break containment
SELECT
  UncType = ue.feature_type_name,
  UncName = ue.feature_name,
  RefType = o.type_desc,
  RefName = o.name,
  Stmt = ue.statement_type,
  Line = ue.statement_line_number,
  StartPos = ue.statement_offset_begin,
  EndPos = ue.statement_offset_end
 FROM
  sys.dm_db_uncontained_entities AS ue
  INNER JOIN sys.objects AS o ON o.object_id = ue.major_id

Here is the result of the query:

UncType      UncName    RefType               RefName       Stmt    Line  StartPos  EndPos
-----------  ---------  --------------------  ------------  ------  ----  --------  ---
System View  endpoints  SQL_STORED_PROCEDURE  GetEndpoints  SELECT  5     218       274

The DMV identifies the stored procedure GetEndpoints as an object with an uncontained entity reference. Specifically, the output reveals that a stored procedure references the sys.endpoints view in a SELECT statement on line 5 at position 218. This alerts you to a database dependency on endpoints configured at the server instance level that could potentially pose an issue for portability. The GetTables stored procedure does not have any uncontained entity references (sys.tables is contained), and is therefore not reported by the DMV.

Collations and tempdb

Ordinarily, all databases hosted on the same SQL Server instance share the same tempdb database for storing temporary tables, and all the databases (including tempdb) on the instance use the same collation setting (collation controls string data character set, case sensitivity, and accent sensitivity). When joining between regular database tables and temporary tables, both your user database and tempdb must use a compatible collation. This, again, represents an instance-level dependency with respect to the fact that the collation setting can vary from one server instance to another. Thus, problems arise when moving databases between servers that have different collation settings for tempdb. The code below demonstrates the problem, and how to avoid it by using a contained database.

-- Create an uncontained database with custom collation
USE master
GO
IF EXISTS(SELECT name FROM sys.databases WHERE name = 'MyDB')
 DROP DATABASE MyDB
GO
CREATE DATABASE MyDB COLLATE Chinese_Simplified_Pinyin_100_CI_AS
GO

USE MyDB
GO

-- Create a table in MyDB (uses Chinese_Simplified_Pinyin_100_CI_AS collation)
CREATE TABLE TestTable (TextValue nvarchar(max))

-- Create a temp table in tempdb (uses SQL_Latin1_General_CP1_CI_AS collation)
CREATE TABLE #TempTable (TextValue nvarchar(max))

-- Fails, because MyDB and tempdb uses different collation
SELECT *
 FROM TestTable INNER JOIN #TempTable ON TestTable.TextValue = #TempTable.TextValue

-- Convert to a partially contained database
DROP TABLE #TempTable
USE master

ALTER DATABASE MyDB SET CONTAINMENT=PARTIAL
GO

USE MyDB
GO

-- Create a temp table in MyDB (uses Chinese_Simplified_Pinyin_100_CI_AS collation)
CREATE TABLE #TempTable (TextValue nvarchar(max))

-- Succeeds, because the table in tempdb now uses the same collation as MyDB
SELECT *
 FROM TestTable INNER JOIN #TempTable ON TestTable.TextValue = #TempTable.TextValue

-- Cleanup
DROP TABLE #TempTable
USE master
DROP DATABASE MyDB
GO

This code first creates an uncontained database that uses Chinese_Simplified_Pinyin_100_CI_AS collation on a server instance that uses (the default) SQL_Latin1_General_CP1_CI_AS collation. The code then creates a temporary table and then attempts to join an ordinary database table against it. The attempt fails because the two tables have different collations (that is, they each reside in databases that use different collations), and SQL Server issues the following error message:

Msg 468, Level 16, State 9, Line 81
Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and
"Chinese_Simplified_Pinyin_100_CI_AS" in the equal to operation.

Then the code issues an ALTER DATABASE…SET CONTAINMENT=PARTIAL statement to convert the database to a partially contained database. As a result, SQL Server resolves the conflict by collating the temporary table in tempdb in the same collation as the contained database, and the second join attempt succeeds.

Summary

Partially contained databases in SQL Server 2012 go a long way helping to improve the portability of databases across servers and instances. In this blog post, I demonstrated how to create a partially contained database with contained users, how to deal with collation issues, and how to use the new data management view to discover threats to containment and identify external dependencies. These capabilities are welcome news for SQL Server DBAs everywhere. Enjoy!

New Spatial Features in SQL Server 2012

SQL Server 2012 adds many significant improvements to the spatial support that was first introduced with SQL Server 2008. Among the more notable enhancements is support for curves (arcs), where SQL Server 2008 only supported straight lines, or polygons composed of straight lines. Microsoft also provides methods that test for non-2012-compatible (curved) shapes, and convert circular data to line data for backward compatibility with SQL Server 2008 (as well as other mapping platforms that don’t support curves).

New Spatial Data Classes

The three new spatial data classes in SQL Server 2012 are:

  • Circular strings
  • Compound curves
  • Curve polygons

All three of these shapes are supported in WKT, WKB, and GML by both the geometry and geography data types, and all of the existing methods work on all of the new circular shapes. My previous post, Geospatial Support for Circular Data in SQL Server 2012 covers these new spatial classes in detail, and shows you how to use them to create circular data. This post focuses on additional spatial features that are new in SQL Server 2012.

New Spatial Methods

Let’s explore a few of the new spatial methods. Some of these new methods complement the new curved shapes, while others add new spatial features that work with all shapes.

The STNumCurves and STCurveN Methods

These two methods can be invoked on any geometry or geography instance. They can be used together to discover information about the curves contained within the spatial instance. The STNumCurves method returns the total number of curves in the instance. You can then pass any number between 1 and what STNumCurves returns to extract each individual curve, and thus iterate all the curves in the instance.

For example, the WKT string CIRCULARSTRING(0 4, 4 0, 8 4, 4 8, 0 4) defines a perfect circle composed of two connected segments; 0 4, 4 0, 8, 4 and 8 4, 4 8, 0 4 (the third coordinate 8 4 is used both as the ending point of the first arc and the starting point of the second arc. The following code demonstrates how to obtain curve information from this circular string using the STNumCurves and STCurveN methods.

-- Create a full circle shape (two connected semi-circles)
DECLARE @C geometry = 'CIRCULARSTRING(0 4, 4 0, 8 4, 4 8, 0 4)'

-- Get the curve count (2) and the 1st curve (bottom semi-circle)
SELECT
  CurveCount = @C.STNumCurves(),
  SecondCurve = @C.STCurveN(2),
  SecondCurveWKT = @C.STCurveN(2).ToString()

This query produces the following output:

CurveCount SecondCurve                                     SecondCurveWKT
---------- ----------------------------------------------- -------------------------------
2          0x000000000204030000000000000000002040000000... CIRCULARSTRING (8 4, 4 8, 0 4)

You can see that STNumCurves indicates there are two curves, and that STCurveN(2) returns the second curve. If you view the results in the spatial viewer, you’ll see just the top half of the circle. This is the semi-circle defined by the second curve, which is converted back to WKT as CIRCULARSTRING (8 4, 4 8, 0 4). Notice that this represents the second segment of the full circle.

The BufferWithCurves Method

SQL Server 2008 introduced the STBuffer method which “pads” a line, effectively converting it into a polygon. If you look closely at the resulting polygon shapes in the spatial viewer, it appears that the points of each line string (including the mid points) are transformed into rounded edges in the polygon. However, the rounded edge look is actually produced by plotting many short straight lines that are clustered very closely together, presenting the illusion of a curve. This approach is necessary since curves were not previously supported before SQL Server 2012 (but the STBuffer method was).

Clearly, using native curve definitions in a curve polygon is more efficient than clustering a multitude of straight lines in an ordinary polygon. For backward compatibility, STBuffer continues to return the (inefficient) polygon as before. So SQL Server 2012 introduces a new method, BufferWithCurves, for this purpose. The following code uses BufferWithCurves to pad lines using true curves, and compares the result with its straight-line cousin, STBuffer.

DECLARE @streets geometry = '
 GEOMETRYCOLLECTION(
  LINESTRING (100 -100, 20 -180, 180 -180),
  LINESTRING (300 -300, 300 -150, 50 -50)
 )'
SELECT @streets.BufferWithCurves(10)

SELECT
  AsWKT = @streets.ToString(),
  Bytes = DATALENGTH(@streets),
  Points = @streets.STNumPoints()
 UNION ALL
 SELECT
  @streets.STBuffer(10).ToString(),
  DATALENGTH(@streets.STBuffer(10)),
  @streets.STBuffer(10).STNumPoints()
 UNION ALL
 SELECT
  @streets.BufferWithCurves(10).ToString(),
  DATALENGTH(@streets.BufferWithCurves(10)),
  @streets.BufferWithCurves(10).STNumPoints()

Here is the resulting shape returned by the first SELECT statement (the collection of padded line shapes generated by BufferWithCurves):

As with STBuffer, the new shapes have rounded edges around the points of the original line strings. However, BufferWithCurves generates actual curves, and thus, produces a significantly smaller and simpler polygon. The second SELECT statement demonstrates by comparing the three shapes—the original line string collection, the polygon returned by STBuffer, and the curve polygon returned by BufferWithCurves. Here are the results:

AsWKT                                                                       Bytes  Points
--------------------------------------------------------------------------  -----  ------
GEOMETRYCOLLECTION (LINESTRING (100 -100, 20 -180, 180 -180), LINESTRIN...  151    6
MULTIPOLYGON (((20.000000000000796 -189.99999999999858, 179.99999999999...  5207   322
GEOMETRYCOLLECTION (CURVEPOLYGON (COMPOUNDCURVE ((20.000000000000796 -1...  693    38

The first shape is the original geometry collection of line strings used for input, which requires only 151 bytes of storage, and has only 6 points. For the second shape, STBuffer pads the line strings to produce a multi-polygon (a set of polygons) that consumes 5,207 bytes and has a total of 322 points—a whopping 3,448 percent increase from the original line strings. In the third shape, BufferWithCurves is used to produce the equivalent padding using a collection of curve polygons composed of compound curves, so it consumes only 693 bytes and has only 38 points—a (relatively) mere 458 percent increase from the original line strings.

The ShortestLineTo Method

This new method examines any two shapes and figures out the shortest line between them. The following code demonstrates:

DECLARE @Shape1 geometry = 'POLYGON ((-20 -30, -3 -26, 14 -28, 20 -40, -20 -30))'
DECLARE @Shape2 geometry = 'POLYGON ((-18 -20, 0 -10, 4 -12, 10 -20, 2 -22, -18 -20))'

SELECT @Shape1
UNION ALL
SELECT @Shape2
UNION ALL
SELECT @Shape1.ShortestLineTo(@Shape2).STBuffer(.25)

This code defines two polygons and then uses ShortestLineTo to determine, generate, and return the shortest straight line that connects them. STBuffer is also used to pad the line string so that it is more clearly visible in the spatial viewer:

The MinDbCompatibilityLevel Method

With the added support for curves in SQL Server 2012 comes support for backward compatibility with previous versions of SQL Server (2008 and 2008 R2) that don’t support curves. The new MinDbCompatibilityLevel method accepts any WKT string and returns the minimum version of SQL Server required to support the shape defined by that string. For example, consider the following code:

DECLARE @Shape1 geometry = 'CIRCULARSTRING(0 50, 90 50, 180 50)'
DECLARE @Shape2 geometry = 'LINESTRING (0 50, 90 50, 180 50)'

SELECT
 Shape1MinVersion = @Shape1.MinDbCompatibilityLevel(),
 Shape2MinVersion = @Shape2.MinDbCompatibilityLevel()

The MinDbCompatibilityLevel method returns 110 (referring to version 11.0) for the first WKT string and 100 (version 10.0) for the second one. This is because the first WKT string contains a circular string, which requires SQL Server 2012 (version 11.0), while the line string in the second WKT string is supported by SQL Server 2008 (version 10.0) and higher.

The STCurveToLine and CurveToLineWithTolerance Methods

These are two methods you can use to convert curves to roughly equivalent straight line shapes. Again, this is to provide compatibility with previous versions of SQL Server and other mapping platforms that don’t support curves.

The STCurveToLine method converts a single curve to a line string with a multitude of segments and points that best approximate the original curve. The technique is similar to what we just discussed for STBuffer, where many short straight lines are connected in a cluster of points to simulate a curve. And, as explained in that discussion, the resulting line string requires significantly more storage than the original curve. To offer a compromise between fidelity and storage, the CurveToLineWithTolerance method accepts “tolerance” parameters to produce line strings that consume less storage space than those produced by STCurveToLine. The following code demonstrates by using both methods to convert the same circle shape from the previous STNumCurves and STCurveN example into line strings.

-- Create a full circle shape (two connected semi-circles)
DECLARE @C geometry = 'CIRCULARSTRING(0 4, 4 0, 8 4, 4 8, 0 4)'

-- Render as curved shape
SELECT
  Shape = @C,
  ShapeWKT = @C.ToString(),
  ShapeLen = DATALENGTH(@C),
  Points = @C.STNumPoints()

-- Convert to lines (much larger, many more points)
SELECT
  Shape = @C.STCurveToLine(),
  ShapeWKT = @C.STCurveToLine().ToString(),
  ShapeLen = DATALENGTH(@C.STCurveToLine()),
  Points = @C.STCurveToLine().STNumPoints()

-- Convert to lines with tolerance (not as much larger, not as many more points)
SELECT
  Shape = @C.CurveToLineWithTolerance(0.1, 0),
  ShapeWKT = @C.CurveToLineWithTolerance(0.1, 0).ToString(),
  ShapeLen = DATALENGTH(@C.CurveToLineWithTolerance(0.1, 0)),
  Points = @C.CurveToLineWithTolerance(0.1, 0).STNumPoints()

The query results show that the original circle consumes only 112 bytes and has 5 points. Invoking STCurveToLine on the circle converts it into a line string that consumes 1,072 bytes and has 65 points. That’s a big increase, but the resulting line string represents the original circle in high fidelity; you will not see a perceptible difference in the two when viewing them using the spatial viewer. However, the line string produced by CurveToLineWithTolerance consumes only 304 bytes and has only 17 points; a significantly smaller footprint, paid for with a noticeable loss in fidelity. As shown by the spatial viewer results below, using CurveToLineWithTolerance produces a circle made up of visibly straight line segments:

The STIsValid, IsValidDetailed and MakeValid Methods

Spatial instance validation has improved greatly in SQL Server 2012. The STIsValid method evaluates a spatial instance and returns a 1 (for true) or 0 (for false) indicating if the instance represents a valid shape (or shapes). If the instance is invalid, the new IsValidDetailed method will return a string explaining the reason why. The following code demonstrates.

DECLARE @line geometry = 'LINESTRING(1 1, 2 2, 3 2, 2 2)'

SELECT
 IsValid = @line.STIsValid(),
 Details = @line.IsValidDetailed()

This line string is invalid because the same point (2 2) is repeated, which results in “overlapping edges,” as revealed by the output from IsValidDetailed:

IsValid  Details
-------  -------------------------------------------------------------------
0        24413: Not valid because of two overlapping edges in curve (1).

SQL Server 2012 is more tolerant of invalid spatial instances than previous versions. For example, you can now perform metric operations (such as STLength) on invalid instances, although you still won’t be able to perform other operations (such as STBuffer) on them.

The new MakeValid method can “fix” an invalid spatial instance and make it valid. Of course, the shape will shift slightly, and there are no guarantees on the accuracy or precision of the changes made. The code in Listing 10-27 uses MakeValid to remove overlapping parts (which can be caused by anomalies such as inaccurate GPS traces), effectively converting the invalid line string into a valid spatial instance.

DECLARE @line geometry = 'LINESTRING(1 1, 2 2, 3 2, 2 2)'
SELECT @line.MakeValid().ToString() AS Fixed

The WKT string returned by the SELECT statement shows the “fixed” line string:

Fixed
-------------------------------------------------------------------
LINESTRING (3 2, 2 2, 1.0000000000000071 1.0000000000000036)

Other Enhancements

The remainder of this post gives brief mention to several other noteworthy spatial enhancements added in SQL Server 2012. These include better geography support, and precision and optimization improvements.

Support for geography Instances Exceeding a Logical Hemisphere

Previous versions of SQL Server supported geography objects as large as (slightly less than) a logical hemisphere (half the globe). This limitation has been removed in SQL Server 2012, which now supports geography instances of any size (even the entire planet).

When you define a geography polygon, the order in which you specify the ring’s latitude and longitude coordinates (known as vertex order) is significant (unlike geometry, where vertex order is insignificant). The coordinate points are always defined according to the left-foot inside rule; when you “walk” the boundary of the polygon, your left foot is on the inside. Thus, vertex order determines whether you are defining a small piece of the globe, relative to the larger piece defined by the entire globe except for the small piece (that is, the rest of the globe).

Since previous versions of SQL Server were limited to half the globe, it was impossible to specify the points of a polygon in the “wrong order,” simply because doing so resulted in too large a shape (and thus, raised an error). That error potential no longer exists in SQL Server 2012, so it’s even more critical to make sure your vertex order is correct, or you’ll be unwittingly working with the exact “opposite” shape.

If you have a geography instance that is known have the wrong vertex order, you can repair it using the new ReorientObject method. This method operates only on polygons (it has no effect on points, line strings, or curves), and can be used to correct the ring orientation (vertex order) of the polygon. The following code demonstrates.

-- Small (less than a logical hemisphere) polygon
SELECT geography::Parse('POLYGON((-10 -10, 10 -10, 10 10, -10 10, -10 -10))')

-- Reorder in the opposite direction for "rest of the globe"
SELECT geography::Parse('POLYGON((-10 -10, -10 10, 10 10, 10 -10, -10 -10))')

-- Reorient back to the small polygon
SELECT geography::Parse('POLYGON((-10 -10, -10 10, 10 10, 10 -10, -10 -10))').ReorientObject()

Three geography polygon instances are defined in this code. The first geography instance defines a very small polygon. The second instance uses the exact same coordinates, but because the vertex order reversed, it defines an enormous polygon whose area represents the entire globe except for the small polygon. As explained, such a definition would cause an error in previous versions of SQL Server, but is now accommodated without a problem by SQL Server 2012. The third instance reverses the vertex order on the same shape as the second instance, thereby producing the same small polygon as the first instance.

Full Globe Support

Along with the aforementioned support for geography instances to exceed a single logical hemisphere comes a new spatial data class called FULLGLOBE. As you may have guessed, this is a shape that represents the entire planet. If you’ve ever wondered how many square meters there are in the entire world, the following query gives you the answer (which is 510,065,621,710,996 square meters, so you can stop wondering).

-- Construct a new FullGlobe object (a WGS84 ellipsoid)
DECLARE @Earth geography = 'FULLGLOBE'

-- Calculate the area of the earth
SELECT PlanetArea = @Earth.STArea()

All of the common spatial methods work as expected on a full globe object. So you could, for example, “cut away” at the globe by invoking the STDifference and STSymDifference method against it using other polygons as cookie-cutter shapes.

New “Unit Sphere” Spatial Reference ID

The default spatial reference ID (SRID) in SQL Server 2012 is 4326, which uses the metric system as its unit of measurement. This SRID also represents the true ellipsoidal sphere shape of the earth. While this representation is most accurate, it’s also more complex to calculate precise ellipsoidal mathematics. SQL Server 2012 offers a compromise in speed and accuracy, by adding a new spatial reference id (SRID), 104001, which uses a sphere of radius 1 to represent a perfectly round earth.

You can create geography instances with SRID 104001 when you don’t require the greatest accuracy. The STDistance, STLength, and ShortestLineTo methods are optimized to run faster on the unit sphere, since it takes a relatively simple formula to compute measures against a perfectly round sphere (compared to an ellipsoidal sphere).

Better Precision

Internal spatial calculations in SQL Server 2012 are now performed with 48 bits of precision, compared to 27 bits used in SQL Server 2008 and SQL Server 2008 R2. This can reduce the error caused by rounding of floating point coordinates for original vertex points by the internal computation.

Summary

This blog post introduced you to some of the powerful new spatial capabilities added to SQL Server 2012. You saw how to use STNumCurves and STCurveN to obtain curve information from circular data, the BufferWithCurves method to produce more efficient padded line shapes than STBuffer, and the ShortestLineTo method to figure out the shortest distance between two shapes. Then you saw how to use the new MinDbCompatibilityLevel, STCurveToLine, and CurveToLineWithTolerance methods for supporting backward compatibility with SQL Server 2008. You also learned how SQL Server 2012 is much better at handling invalid spatial data, using the STIsValid, IsValidDetailed, and MakeValid methods. Finally, you learned about the new full globe support, unit sphere SRID, and improved precision.

You can learn much more about spatial functionality in my new book Programming Microsoft SQL Server 2012, which has an entire chapter dedicated to the topic. I hope you get to enjoy these powerful new spatial capabilities in SQL Server 2012!

How Significant is the SQL Server 2012 Release?

SQL Server, particularly its relational database engine, matured quite some time ago. So the “significance” of every new release over recent years can be viewed—in some ways—as relatively nominal. The last watershed release of the product was actually SQL Server 2005, which was when the relational engine (that, for years, defined SQL Server) stopped occupying “center stage,” and instead took its position alongside a set of services that now, collectively, define the product. These of course include the Business Intelligence (BI) components Reporting Services, Analysis Services, and Integration Services—features that began appearing as early as 1999 but, prior to SQL Server 2005, were integrated sporadically as a patchwork of loosely coupled add-ons, wizards, and management consoles. SQL Server 2005 changed all that with a complete overhaul. For the first time, SQL Server delivered a broader, richer, and more consolidated set of features and services which are built into—rather than bolted onto—the platform. None of the product versions that have been released since that time—SQL Server 2008, 2008 R2, and now 2012—have changed underlying architecture this radically.

That said, each SQL Server release continues to advance itself in vitally significant ways. SQL Server 2008 (released August 6, 2008) added a host of new features to the relational engine—T-SQL enhancements, Change Data Capture (CDC), Transparent Data Encryption (TDE), SQL Audit, FILESTREAM—plus powerful BI capabilities with Excel PivotTables, charts, and CUBE formulas. SQL Server 2008 R2 (released April 21, 2010) was dubbed the “BI Refresh,” adding PowerPivot for Excel, Master Data Services, and StreamInsight, but offering nothing more than minor tweaks and fixes in the relational engine.

The newest release—SQL Server 2012—is set to officially launch on March 7, 2012, and like every new release, this version improves on all of the key “abilities” (availability, scalability, manageability, programmability, and so on). The chief reliability improvement is the new High Availability Disaster Recovery (HADR) alternative to database mirroring. HADR (also commonly known as “Always On”) utilizes multiple secondary servers in an “availability group” for scale-out read-only operations (rather than forcing them to sit idle, just waiting for a failover to occur). Multi-subnet failover clustering is another notable new manageability feature.

SQL Server 2012 adds many new features to the relational engine, many of which I have blogged about, and most of which I cover in my new book (soon to be published). There are powerful T-SQL extensions, most notably the windowing enhancements, 22 new T-SQL functions, improved error handling, server-side paging, sequence generators, and rich metadata discovery techniques. There are also remarkable improvements for unstructured data, such as the FileTable abstraction over FILESTREAM and the Windows file system API, full-text property searching, and Statistical Semantic Search. Spatial support gets a big boost as well, with support for circular data, full-globe support, increased performance, and greater parity between the geometry and geography data types. Contained databases simplify portability, and new “columnstore” technology drastically increases performance of huge OLAP cubes (VertiPaq for PowerPivot and Analysis Services) and data warehouses (a similar implementation in the relational engine).

The aforementioned features are impressive, but still don’t amount to much more than “additives” over an already established database platform. A new release needs more than just extra icing on the cake for customers to perceive an upgrade as compelling. To that end, Microsoft has invested heavily in BI with SQL Server 2012, and the effort shows. The BI portion of the stack has been expanded greatly, delivering key advances in “pervasive insight.” This includes major updates to the product’s analytics, data visualization (such as self-service reporting with Power View), and master data management capabilities, as well Data Quality Services (DQS), a brand new data quality engine. There is also a new Business Intelligence edition of the product that includes all of these capabilities without requiring a full Enterprise edition license. Finally, SQL Server Data Tools (SSDT) brings brand new database tooling inside Visual Studio 2010. SSDT provides a declarative, model-based design-time experience for developing databases while connected, offline, on-premise, or in the cloud.

Of course there are more new features; I’ve only mentioned the most notable ones in this post. Although the relational engine has been rock-solid for several releases, it continues to enjoy powerful enhancements, while the overall product continues to reinvent itself by expanding its stack of services, particularly in the Business Intelligence space.

Geospatial Support for Circular Data in SQL Server 2012

SQL Server 2012 adds many significant improvements to the spatial support that was first introduced with SQL Server 2008. In this blog post, I’ll explore one of the more notable enhancements: support for curves and arcs (circular data). SQL Server 2008 only supported straight lines, or polygons composed of straight lines. The three new shapes in SQL Server 2012 are circular strings, compound curves, and curve polygons. All three are supported in Well-Known Text (WKT), Well-Known Binary (WKB), and Geometry Markup Language (GML) by both the geometry (planar, or “flat-earth” model) and geography (ellipsoidal sphere, or geodetic) data types, and all of the existing methods work on the new shapes.

Circular Strings

A circular string defines a basic curved line, similar to how a line string defines a straight line. It takes a minimum of three coordinates to define a circular string; the first and third coordinates define the end points of the line, and the second coordinate (the “anchor” point, which lies somewhere between the end points) determines the arc of the line. Here is the shape represented by CIRCULARSTRING(0 1, .25 0, 0 -1):

The following code produces four circular strings. All of them have the same start and end points, but different anchor points. The lines are buffered slightly to make them easier to see in the spatial viewer.

-- Create a "straight" circular line
SELECT geometry::Parse('CIRCULARSTRING(0 8, 4 0, 8 -8)').STBuffer(.1)
UNION ALL  -- Curve it
SELECT geometry::Parse('CIRCULARSTRING(0 8, 4 4, 8 -8)').STBuffer(.1)
UNION ALL  -- Curve it some more
SELECT geometry::Parse('CIRCULARSTRING(0 8, 4 6, 8 -8)').STBuffer(.1)
UNION ALL  -- Curve it in the other direction
SELECT geometry::Parse('CIRCULARSTRING(0 8, 4 -6, 8 -8)').STBuffer(.1)

The spatial viewer in SQL Server Management Studio shows the generated shapes:

The first shape is a “straight circular” line, because the anchor point is position directly between the start and end points. The next two shapes use the same end points with the anchor out to the right (4), one a bit further than the other (6). The last shape also uses the same end points, but specifies an anchor point that curves the line to the left rather than the right (-6).

You can extend circular strings with as many curve segments as you want. Do this by defining another two coordinates for each additional segment. The last point of the previous curve serves as the first end point of the next curve segment, so the two additional coordinates respectively specify the next segment’s anchor and second end point. Thus, valid circular strings will always have an odd number of points. You can extend a circular string indefinitely to form curves and arcs of any kind.

It’s easy to form a perfect circle by connecting two semi-circle segments. For example, the following illustration shows the circle produced by CIRCULARSTRING(0 4, 4 0, 8 4, 4 8, 0 4).

This particular example connects the end of the second segment to the beginning of the first segment to form a closed shape. Note that this is certainly not required of circular strings (or line strings), and that closing the shape by connecting the last segment to the first still does not result in a polygon, which is a two dimensional shape that has area. Despite being closed, this circle is still considered a one-dimensional shape with no area. As you’ll soon see, the curve polygon can be used to convert closed line shapes into true polygons.

Compound Curves

A compound curve is a set of circular strings, or circular strings combined with line strings, that form a desired curved shape. The end point of each element in the collection must match the starting point of the following element, so that compound curves are defined in a “connect-the-dots” fashion. The following code produces a compound curve and compares it with the equivalent geometry collection shape.

-- Compound curve
 DECLARE @CC geometry = '
  COMPOUNDCURVE(
   (4 4, 4 8),
   CIRCULARSTRING(4 8, 6 10, 8 8),
   (8 8, 8 4),
   CIRCULARSTRING(8 4, 2 3, 4 4)
  )'

-- Equivalent geometry collection
 DECLARE @GC geometry = '
  GEOMETRYCOLLECTION(
   LINESTRING(4 4, 4 8),
   CIRCULARSTRING(4 8, 6 10, 8 8),
   LINESTRING(8 8, 8 4),
   CIRCULARSTRING(8 4, 2 3, 4 4)
  )'

-- They both render the same shape in the spatial viewer
 SELECT @CC.STBuffer(.5)
 UNION ALL
 SELECT @GC.STBuffer(1.5)

This code creates a keyhole shape using a compound curve, and also creates an identical shape as a geometry collection (though notice that the LINESTRING keyword is not—and cannot—be specified when defining a compound curve). It then buffers both of them with different padding, so that the spatial viewer clearly shows the two identical shapes on top of one another, as shown:

Both the compound curve and the geometry collection yield identical shapes. In fact, the expression @CC.STEquals(@GC) which compares the two instances for equality returns 1 (for true). The STEquals method tests for “spatial equality,” meaning it returns true if two instances produce the same shape even if they are being rendered using different spatial data classes. Furthermore, recall that segments of a circular string can be made perfectly straight by positioning the anchor directly between the end points, meaning that the circular string offers yet a third option for producing the very same shape. So which one should you use? Comparing these spatial data classes will help you determine which one is best to use in different scenarios.

A geometry collection (which was already supported in SQL Server 2008) is the most accommodating, but carries the most storage overhead. Geometry collections can hold instances of any spatial data class, and the instances don’t need to be connected to (or intersected with) each other in any way. The collection simply holds a bunch of different shapes as a set, which in this example just happens to be several line strings and circular strings connected at their start and end points.

In contrast, the new compound curve class in SQL Server 2012 has the most constraints but is the most lightweight in terms of storage. It can only contain line strings or circular strings, and each segment’s start point must be connected to the previous segment’s end point (although it is most certainly not necessary to connect the first and last segments to form a closed shape as in this example). The DATALENGTH function shows the difference in storage requirements; DATALENGTH(@CC) returns 152 and DATALENGTH(@GC) returns 243. In our current example, DATALENGTH(@CC) returns 152 and DATALENGTH(@GC) returns 243. This means that the same shape requires 38% less storage space by using a compound curve instead of a geometry collection. A compound curve is also more storage-efficient than a multi-segment circular line string when straight lines are involved. This is because there is overhead for the mere potential of a curve, since the anchor point requires storage even when it’s position produces straight lines, whereas compound curves are optimized specifically to connect circular strings and (always straight) line strings.

Curve Polygons

A curve polygon is very similar to an ordinary polygon; like an ordinary polygon, a curve polygon specifies a “ring” that defines a closed shape, and can also specify additional inner rings to define “holes” inside the shape. The only fundamental difference between a polygon and a curve polygon is that the rings of a curve polygon can include circular shapes, whereas an ordinary polygon is composed exclusively with straight lines. Specifically, each ring in a curve polygon can consist of any combination of line strings, circular strings, and compound curves that collectively define the closed shape. For example, the following code produces a curve polygon with the same keyhole outline that I just demonstrated for the compound curve.

-- Curve polygon
 SELECT geometry::Parse('
  CURVEPOLYGON(
   COMPOUNDCURVE(
    (4 4, 4 8),
    CIRCULARSTRING(4 8, 6 10, 8 8),
    (8 8, 8 4),
    CIRCULARSTRING(8 4, 2 3, 4 4)
   )
  )')

This code has simply specified the same compound curve as the closed shape of a curve polygon. Although the shape is the same, the curve polygon is a two-dimensional object, whereas the compound curve version of the same shape is a one-dimensional object. This can be seen visually by the spatial viewer results, which shades the interior of the curve polygon as shown here:

Conclusion

Circular data support is an important new capability added to the spatial support in SQL Server 2012. In this blog post, I demonstrated the three new spatial data classes for curves and arcs: circular strings, compound curves, and curve polygons. Stay tuned for my next post (coming soon), for more powerful and fun spatial enhancements coming soon in SQL Server 2012!

SQL Server 2012 Windowing Functions Part 2 of 2: New Analytic Functions

This is the second half of my two-part article on windowing functions in SQL Server 2012. In Part 1, I explained the new running and sliding aggregation capabilities added to the OVER clause in SQL Server 2012. In this post, I’ll explain the new T-SQL analytic windowing functions. All of these functions operate using the windowing principles I explained in Part 1.

Eight New Analytic Functions

There are eight new analytic functions that have been added to T-SQL. All of them work in conjunction with an ordered window defined with an associated ORDER BY clause that can be optionally partitioned with a PARTITION BY clause and framed with a BETWEEN clause. The new functions are:

 • FIRST_VALUE
 • LAST_VALUE
 • LAG
 • LEAD
 • PERCENT_RANK
 • CUME_DIST
 • PERCENTILE_CONT
 • PERCENTILE_DISC

In the following code listing, the FIRST_VALUE, LAST_VALUE, LAG, and LEAD functions are used to analyze a set of orders at the product level.

DECLARE @Orders AS table(OrderDate date, ProductID int, Quantity int)
INSERT INTO @Orders VALUES
 ('2011-03-18', 142, 74),
 ('2011-04-11', 123, 95),
 ('2011-04-12', 101, 38),
 ('2011-05-21', 130, 12),
 ('2011-05-30', 101, 28),
 ('2011-07-25', 123, 57),
 ('2011-07-28', 101, 12)

SELECT
  OrderDate,
  ProductID,
  Quantity,
  WorstOn = FIRST_VALUE(OrderDate) OVER(PARTITION BY ProductID ORDER BY Quantity),
  BestOn = LAST_VALUE(OrderDate) OVER(PARTITION BY ProductID ORDER BY Quantity
                          ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
  PrevOn = LAG(OrderDate, 1) OVER(PARTITION BY ProductID ORDER BY OrderDate),
  NextOn = LEAD(OrderDate, 1) OVER(PARTITION BY ProductID ORDER BY OrderDate)
 FROM @Orders
 ORDER BY OrderDate

OrderDate   ProductID  Quantity  WorstOn     BestOn      PrevOn      NextOn
----------  ---------  --------  ----------  ----------  ----------  ----------
2011-03-18  142        74        2011-03-18  2011-03-18  NULL        NULL
2011-04-11  123        95        2011-07-25  2011-04-11  NULL        2011-07-25
2011-04-12  101        38        2011-07-28  2011-04-12  NULL        2011-05-30
2011-05-21  130        12        2011-05-21  2011-05-21  NULL        NULL
2011-05-30  101        28        2011-07-28  2011-04-12  2011-04-12  2011-07-28
2011-07-25  123        57        2011-07-25  2011-04-11  2011-04-11  NULL
2011-07-28  101        12        2011-07-28  2011-04-12  2011-05-30  NULL

In this query, four analytic functions specify an OVER clause that partitions the result set by ProductID. The product partitions defined for FIRST_VALUE and LAST_VALUE are sorted by Quantity, while the product partitions for LAG and LEAD are sorted by OrderDate. The full result set is sorted by OrderDate, so you need to visualize the sorted partition for each of the four functions to understand the output—the result set sequence is not the same as the row sequence used in the windowing functions.

FIRST_VALUE and LAST_VALUE

The WorstOn and BestOn columns use FIRST_VALUE and LAST_VALUE respectively to return the “worst” and “best” dates for the product in each partition. Performance is measured by quantity, so sorting each product’s partition by quantity will position the worst order at the first row in the partition and the best order at the last row in the partition. FIRST_VALUE and LAST_VALUE can return the value of any column (OrderDate, in this case), not just the aggregate column itself. For LAST_VALUE, it is also necessary to explicitly define a window over the entire partition with ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. Otherwise, as explained in my coverage of OVER clause enhancements in Part 1, the default window is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which frames (constrains) the window, and does not consider the remaining rows in the partition that are needed to obtain the highest quantity for LAST_VALUE.

In the output, notice that OrderDate, LowestOn, and HighestOn for the first order (product 142) are all the same value (3/18). This is because product 142 was only ordered once, so FIRST_VALUE and LAST_VALUE operate over a partition that has only this one row in it, with an OrderDate value of 3/18. The second row is for product 123, quantity 95, ordered on 4/11. Four rows ahead in the result set (not the partition) there is another order for product 123, quantity 57, placed on 7/25. This means that, for this product, FIRST_VALUE and LAST_VALUE operate over a partition that has these two rows in it, sorted by quantity. This positions the 7/25 order (quantity 57) first and the 4/11 (quantity 95) last within the partition. As a result, rows for product 123 report 7/25 for WorstDate and 4/11 for BestDate. The next order (product 101) appears two more times in the result set, creating a partition of three rows. Again, based on the Quantity sort of the partition, each row in the partition reports the product’s worst and best dates (which are 7/28 and 4/12, respectively).

LAG and LEAD

The PrevOn and NextOn columns use LAG and LEAD to return the previous and next date that each product was ordered. They specify an OVER clause that partitions by ProductId as before, but the rows in these partitions are sorted by OrderDate. Thus, the LAG and LEAD functions examine each product’s orders in chronological sequence, regardless of quantity. For each row in each partition, LAG is able to access previous (lagging) rows within the same partition. Similarly, LEAD can access subsequent (leading) rows within the same partition. The first parameter to LAG and LEAD specifies the column value to be returned from a lagging or leading row, respectively. The second parameter specifies the number of rows back or forward LAG and LEAD should seek within each partition, relative to the current row. The query passes OrderDate and 1 as parameters to LAG and LEAD, using product partitions that are ordered by date. Thus, the query returns the most recent past date, and nearest future date, that each product was ordered.

Because the first order’s product (142) was only ordered once, its single-row partition has no lagging or leading rows, and so LAG and LEAD both return NULL for PrevOn and NextOn. The second order (on 4/11) is for product 123, which was ordered again on 7/25, creating a partition with two rows sorted by OrderDate, with the 4/11 order positioned first and the 7/25 order positioned last within the partition. The first row in a multi-row window has no lagging rows, but at least one leading row. Similarly, the last order in a multi-row window has at least one lagging row, but no leading rows. As a result, the first order (4/11) reports NULL and 7/25 for PrevOn and NextOn (respectively), and the second order (7/25) reports 4/11 and NULL for PrevOn and NextOn (respectively). Product 101 was ordered three times, which creates a partition of three rows. In this partition, the second row has both a lagging row and a leading row. Thus, the three orders report PrevOn and NextOn values for product 101, respectively indicating NULL-5/30 for the first (4/12) order, 4/12-7/28 for the second (5/30) order, and 5/30-NULL for the third and last order.

The last functions to examine are PERCENT_RANK (rank distribution), CUME_DIST (cumulative distribution, or percentile), PERCENTILE_CONT (continuous percentile lookup), and PERCENTILE_DISC (discreet percentile lookup). The following queries demonstrate these functions, which are all closely related, by querying sales figures across each quarter of two years.

DECLARE @Sales table(Yr int, Qtr int, Amount money)
INSERT INTO @Sales VALUES
  (2010, 1, 5000), (2010, 2, 6000), (2010, 3, 7000), (2010, 4, 2000),
  (2011, 1, 1000), (2011, 2, 2000), (2011, 3, 3000), (2011, 4, 4000)

-- Distributed across all 8 quarters
SELECT
  Yr, Qtr, Amount,
  R = RANK() OVER(ORDER BY Amount),
  PR = PERCENT_RANK() OVER(ORDER BY Amount),
  CD = CUME_DIST() OVER(ORDER BY Amount)
 FROM @Sales
 ORDER BY Amount

-- Distributed (partitioned) by year with percentile lookups
SELECT
  Yr, Qtr, Amount,
  R = RANK() OVER(PARTITION BY Yr ORDER BY Amount),
  PR = PERCENT_RANK() OVER(PARTITION BY Yr ORDER BY Amount),
  CD = CUME_DIST() OVER(PARTITION BY Yr ORDER BY Amount),
  PD5 = PERCENTILE_DISC(.5) WITHIN GROUP (ORDER BY Amount) OVER(PARTITION BY Yr),
  PD6 = PERCENTILE_DISC(.6) WITHIN GROUP (ORDER BY Amount) OVER(PARTITION BY Yr),
  PC5 = PERCENTILE_CONT(.5) WITHIN GROUP (ORDER BY Amount) OVER(PARTITION BY Yr),
  PC6 = PERCENTILE_CONT(.6) WITHIN GROUP (ORDER BY Amount) OVER(PARTITION BY Yr)
 FROM @Sales
 ORDER BY Yr, Amount

Yr    Qtr  Amount   R  PR                 CD
----  ---  -------  -  -----------------  -----
2011  1    1000.00  1  0                  0.125
2011  2    2000.00  2  0.142857142857143  0.375
2010  4    2000.00  2  0.142857142857143  0.375
2011  3    3000.00  4  0.428571428571429  0.5
2011  4    4000.00  5  0.571428571428571  0.625
2010  1    5000.00  6  0.714285714285714  0.75
2010  2    6000.00  7  0.857142857142857  0.875
2010  3    7000.00  8  1                  1

Yr    Qtr  Amount   R  PR                 CD    PD5      PD6      PC5   PC6
----  ---  -------  -  -----------------  ----  -------  -------  ----  ----
2010  4    2000.00  1  0                  0.25  5000.00  6000.00  5500  5800
2010  1    5000.00  2  0.333333333333333  0.5   5000.00  6000.00  5500  5800
2010  2    6000.00  3  0.666666666666667  0.75  5000.00  6000.00  5500  5800
2010  3    7000.00  4  1                  1     5000.00  6000.00  5500  5800
2011  1    1000.00  1  0                  0.25  2000.00  3000.00  2500  2800
2011  2    2000.00  2  0.333333333333333  0.5   2000.00  3000.00  2500  2800
2011  3    3000.00  3  0.666666666666667  0.75  2000.00  3000.00  2500  2800
2011  4    4000.00  4  1                  1     2000.00  3000.00  2500  2800

The new functions are all based on the RANK function introduced in SQL Server 2005. So both these queries also report on RANK, which will aid both in my explanation and your understanding of each of the new functions.

PERCENT_RANK and CUME_DIST

In the first query, PERCENT_RANK and CUME_DIST (aliased as PR and CD respectively) rank quarterly sales across the entire two year period. Look at the value returned by RANK (aliased as R). It ranks each row in the unpartitioned window (all eight quarters) by dollar amount. Both 2011Q2 and 2010Q4 are tied for $2,000 in sales, so RANK assigns them the same value (2). The next row break the tie, so RANK continues with 4, which accounts for the “empty slot” created by the two previous rows that were tied.

Now examine the values returned by PERCENT_RANK and CUME_DIST. Notice how they reflect the same information as RANK with decimal values ranging from 0 and 1. The only difference between the two is a slight variation in their formulaic implementation, such that PERCENT_RANK always starts with 0 while CUME_DIST always starts with a value greater than 0. Specifically, PERCENT_RANK returns (RANK – 1) / (N – 1) for each row, where N is the total number of rows in the window. This always returns 0 for the first (or only) row in the window. CUME_DIST returns RANK / N, which always returns a value greater than 0 for the first row in the window (which would be 1, if there’s only row). For windows with two or more rows, both functions return 1 for the last row in the window with decimal values distributed among all the other rows.

The second query examines the same sales figures, only this time the result set is partitioned by year. There are no ties within each year, so RANK assigns the sequential numbers 1 through 4 to each of the quarters, for 2010 and 2011, by dollar amount. You can see that PERCENT_RANK and CUME_DIST perform the same RANK calculations as explained for the first query (only, again, partitioned by year this time).

PERCENTILE_DISC and PERCENTILE_CONT

This query also demonstrates PERCENTILE_DISC and PERCENTILE_CONT. These very similar functions each accept a percentile parameter (the desired CUME_DIST value) and “reach in” to the window for the row at or near that percentile. The code demonstrates by calling both functions twice, once with a percentile parameter value of .5 and once with .6, returning columns aliased as PD5, PD6, PC5, and PC6. Both functions examine the CUME_DIST value for each row in the window to find the one nearest to .5 and .6. The subtle difference between them is that PERCENTILE_DISC will return a precise (discreet) value from the row with the matching percentile (or greater), while PERCENTILE_CONT interpolates a value based on a continuous range. Specifically, PERCENTILE_CONT returns a value ranging from the row matching the specified percentile—or a calculated value higher than that (based on the specified percentile) if there is no exact match—and the row with the next higher percentile in the window. This explains the values they return in this query.

Notice that these functions define their window ordering using ORDER BY in a WITHIN GROUP clause rather than in the OVER clause. Thus, you do not (and cannot) specify ORDER BY in the OVER clause. The OVER clause is still required, however, so OVER (with empty parentheses) must be specified even if you don’t want to partition using PARTITION BY.

For the year 2010, the .5 percentile (CUME_DIST value) is located exactly on quarter 1, which had $5,000 in sales. Thus PERCENTILE_DISC(.5) returns 5000. There is no row in the window with a percentile of .6, so PERCENTILE_DISC(.6) matches up against the first row with a percentile greater than or equal to .6, which is the row for quarter 2 with $6,000 in sales, and thus returns 6000. In both cases, PERCENTILE_DISC returns a discreet value from a row in the window at or greater than the specified percentile. The same calculations are performed for 2011, returning 2000 for PERCENTILE_DISC(.5) and 3000 for PERCENTILE_DISC(.6), corresponding to the $2,000 in sales for quarter 2 (percentile .5) and the $3,000 in sales for quarter 3 (percentile .75)

As I stated, PERCENTILE_CONT is very similar. It takes the same percentile parameter to find the row in the window matching that percentile. If there is no exact match, the function calculates a value based on the scale of percentiles distributed across the entire window, rather than looking ahead to the row having the next greater percentile value, as PERCENTILE_DISC does. Then it returns the median between that value and the value found in the row with the next greater percentile. For 2010, the .5 percentile matches up with 5000 (as before). The next percentile in the window is for .75 for 6000. The median between 5000 and 6000 is 5500 and thus, PERCENTILE_CONT(.5) returns 5500. There is no row in the window with a percentile of .6, so PERCENTILE_CONT(.6) calculates what the value for .6 would be (somewhere between 5000 and 6000, a bit closer to 5000) and then calculates the median between that value and the next percentile in the window (again, .75 for 6000). Thus, PERCENTILE_CONT(.6) returns 5800; slightly higher than the 5500 returned for PERCENTILE_CONT(.5).

Conclusion

This post explained the eight new analytic functions added to T-SQL in SQL Server 2012. These new functions, plus the running and sliding aggregation capabilities covered in Part 1, greatly expand the windowing capabilities of the OVER clause available since SQL Server 2005.

SQL Server 2012 Windowing Functions Part 1 of 2: Running and Sliding Aggregates

The first windowing capabilities appeared in SQL Server 2005 with the introduction of the OVER clause and a set of ranking functions (ROW_NUMBER, RANK, DENSE_RANK, and NTILE). The term “window,” as it is used here, refers to the scope of visibility from one row in a result set relative to neighboring rows in the same result set. By default, OVER produces a single window over the entire result set, but its associated PARTITION BY clause lets you divide the result set up into distinct windows—one per partition. Furthermore, its associated ORDER BY clause enables cumulative calculations within each window.

In addition to the four ranking functions, the OVER clause can be used with the traditional aggregate functions (SUM, COUNT, MIN, MAX, AVG). This is extremely useful, because it allows you to calculate aggregations without being forced to summarize all the detail rows with a GROUP BY clause. However, prior to SQL Server 2012, running and sliding calculations with an associated ORDER BY clause was supported only for the ranking functions. Using ORDER BY with OVER for any of the aggregate functions was not allowed. This prevents running and sliding aggregations, severely limited the windowing capability of OVER since its introduction in SQL Server 2005.

Fortunately, SQL Server 2012 finally addresses this shortcoming. In this blog post, the first in a two-part article, I’ll show you how to use OVER/ORDER BY with all the traditional aggregate functions in SQL Server 2012 to provide running aggregates within ordered windows and partitions. I’ll also show you how to frame windows using the ROWS or RANGE clause, which adjusts the size and scope of the window and enables sliding aggregations. SQL Server 2012 also introduces eight new analytic functions that are designed specifically to work with ordered (and optionally partitioned) windows using the OVER clause. I will cover those new analytic functions in Part 2.

To demonstrate running and sliding aggregates, create a table and populate it with sample financial transactions for several different accounts, as shown below. (Note the use of the DATEFROMPARTS function, also new in SQL Server 2012, which is used to construct a date value from year, month, and day parameters.)

CREATE TABLE TxnData (AcctId int, TxnDate date, Amount decimal)
GO

INSERT INTO TxnData (AcctId, TxnDate, Amount) VALUES
  (1, DATEFROMPARTS(2011, 8, 10), 500),  -- 5 transactions for acct 1
  (1, DATEFROMPARTS(2011, 8, 22), 250),
  (1, DATEFROMPARTS(2011, 8, 24), 75),
  (1, DATEFROMPARTS(2011, 8, 26), 125),
  (1, DATEFROMPARTS(2011, 8, 28), 175),
  (2, DATEFROMPARTS(2011, 8, 11), 500),  -- 8 transactions for acct 2
  (2, DATEFROMPARTS(2011, 8, 15), 50),
  (2, DATEFROMPARTS(2011, 8, 22), 5000),
  (2, DATEFROMPARTS(2011, 8, 25), 550),
  (2, DATEFROMPARTS(2011, 8, 27), 105),
  (2, DATEFROMPARTS(2011, 8, 27), 95),
  (2, DATEFROMPARTS(2011, 8, 29), 100),
  (2, DATEFROMPARTS(2011, 8, 30), 2500),
  (3, DATEFROMPARTS(2011, 8, 14), 500),  -- 4 transactions for acct 3
  (3, DATEFROMPARTS(2011, 8, 15), 600),
  (3, DATEFROMPARTS(2011, 8, 22), 25),
  (3, DATEFROMPARTS(2011, 8, 23), 125)

Running Aggregations

Used by itself, the OVER clause operates over a window that encompasses the entire result set of a query. Windows can be partitioned in your queries using OVER with PARTITION BY, enabling partition-level aggregations to be calculated for each window. And with SQL Server 2012, an ORDER BY clause can also be specified with OVER to achieve row-level running aggregations within each window. The following code demonstrates the use of OVER with ORDER BY to produce running aggregations:

SELECT AcctId, TxnDate, Amount,
  RAvg = AVG(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate),
  RCnt = COUNT(*)    OVER (PARTITION BY AcctId ORDER BY TxnDate),
  RMin = MIN(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate),
  RMax = MAX(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate),
  RSum = SUM(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate)
 FROM TxnData
 ORDER BY AcctId, TxnDate

AcctId TxnDate    Amount RAvg        RCnt RMin RMax RSum
------ ---------- ------ ----------- ---- ---- ---- ----
1      2011-08-10 500    500.000000  1    500  500  500
1      2011-08-22 250    375.000000  2    250  500  750
1      2011-08-24 75     275.000000  3    75   500  825
1      2011-08-26 125    237.500000  4    75   500  950
1      2011-08-28 175    225.000000  5    75   500  1125
2      2011-08-11 500    500.000000  1    500  500  500
2      2011-08-15 50     275.000000  2    50   500  550
2      2011-08-22 5000   1850.000000 3    50   5000 5550
 :

The results of this query are partitioned (windowed) by account. Within each window, the account’s running averages, counts, minimum/maximum values, and sums are ordered by transaction date, showing the chronologically accumulated values for each account. No ROWS clause or RANGE clause is specified, so ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is assumed by default. This yields a window frame size that spans from the beginning of the partition (the first row of each account) through the current row. When the account number changes, the previous window is “closed” and new calculations start running for a new window over the next account number.

Sliding Aggregations

You can narrow each account’s window by framing it with a ROWS BETWEEN n PRECEDING AND CURRENT ROW clause within the OVER clause. This enables sliding calculations, as demonstrated by this slightly modified version of the previous query:

SELECT AcctId, TxnDate, Amount,
  SAvg = AVG(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),
  SCnt = COUNT(*)    OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS 2 PRECEDING),
  SMin = MIN(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS 2 PRECEDING),
  SMax = MAX(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS 2 PRECEDING),
  SSum = SUM(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS 2 PRECEDING)
 FROM TxnData
 ORDER BY AcctId, TxnDate

AcctId TxnDate    Amount SAvg        SCnt SMin SMax SSum
------ ---------- ------ ----------- ---- ---- ---- ----
1      2011-08-10 500    500.000000  1    500  500  500
1      2011-08-22 250    375.000000  2    250  500  750
1      2011-08-24 75     275.000000  3    75   500  825
1      2011-08-26 125    150.000000  3    75   250  450
1      2011-08-28 175    125.000000  3    75   175  375
2      2011-08-11 500    500.000000  1    500  500  500
2      2011-08-15 50     275.000000  2    50   500  550
2      2011-08-22 5000   1850.000000 3    50   5000 5550
 :

This query specifies ROWS BETWEEN 2 PRECEDING AND CURRENT ROW in the OVER clause for the RAvg column, overriding the default window size. Specifically, it frames the window within each account’s partition to a maximum of three rows: the current row, the row before it, and one more row before that one. Once the window expands to three rows, it stops growing and starts sliding down the subsequent rows until a new partition (the next account) is encountered. The BETWEEN…AND CURRENT ROW keywords that specify the upper bound of the window are assumed default, so to reduce code clutter, the other column definitions specify just the lower bound of the window with the shorter variation ROWS 2 PRECEDING.

Notice how the window “slides” within each account. For example, the sliding maximum for account 1 drops from 500 to 250 in the fourth row, because 250 is the largest value in the window of three rows that begins two rows earlier—and the 500 from the very first row is no longer visible in that window. Similarly, the sliding sum for each account is based on the defined window. Thus, the sliding sum of 375 on the last row of account 1 is the total sum of that row (175) plus the two preceding rows (75 + 125) only—not the total sum for all transactions in the entire account, as the running sum had calculated.

Using RANGE versus ROWS

Finally, RANGE can be used instead of ROWS to handle “ties” within a window. While ROWS treats each row in the window distinctly, RANGE will merge rows containing duplicate ORDER BY values, as demonstrated by the following query:

SELECT AcctId, TxnDate, Amount,
 SumByRows = SUM(Amount) OVER (ORDER BY TxnDate ROWS UNBOUNDED PRECEDING),
 SumByRange = SUM(Amount) OVER (ORDER BY TxnDate RANGE UNBOUNDED PRECEDING)
 FROM TxnData
 WHERE AcctId = 2
 ORDER BY TxnDate

AcctId TxnDate    Amount SumByRows SumByRange
------ ---------- ------ --------- ----------
2      2011-08-11 500    500       500
2      2011-08-15 50     550       550
2      2011-08-22 5000   5550      5550
2      2011-08-25 550    6100      6100
2      2011-08-27 105    6205      6300
2      2011-08-27 95     6300      6300
2      2011-08-29 100    6400      6400
2      2011-08-30 2500   8900      8900

In this result set, ROWS and RANGE both return the same values, with the exception of the fifth row. Because the fifth and sixth rows are both tied for the same date (8/27/2011), RANGE returns the combined running sum for both rows. The seventh row (for 8/29/2011) breaks the tie, and ROWS “catches up” with RANGE to return running totals for the rest of the window.

Conclusion

Windowing functions using the OVER clause have been greatly enhanced in SQL Server 2012. In addition to the 4 ranking functions, running and sliding calculations with OVER/ORDER BY is now supported for all the traditional aggregate functions as well. SQL Server 2012 also introduces eight new analytic functions that are designed specifically to work with ordered (and optionally partitioned) windows using the OVER clause. Stay tuned for Part 2, which will show you how to use these new analytic windowing functions.

Download VSLive Orlando SQL Server 2012 Workshop Materials

I just got back from another great VSLive! A special thanks to all the attendees who were at my SQL Server workshop on Monday. You guys had great questions and were a lot of fun. As promised, I’ve posted the very latest version of my slides and code for you to grab right here:

http://bit.ly/VSL2011SQLOrlando

If you missed VSLive! in Orlando, catch us in Vegas next March. In addition to the full-day SQL 2012 workshop, I’ll be giving breakout sessions on SQL Server Data Tools (SSDT) and .NET 4.0 Data Access. Enjoy!

Throwing Errors in SQL Server 2012

Error handling in T-SQL was very difficult to implement properly before SQL Server 2005 introduced the TRY/CATCH construct, a feature loosely based on .NET’s try/catch structured exception handling model. The CATCH block gives you a single place to code error handling logic in the event that a problem occurs anywhere inside the TRY block above it. Before TRY/CATCH, it was necessary to always check for error conditions after every operation by testing the built-in system function @@ERROR. Not only did code become cluttered with the many @@ERROR tests, developers (being humans) would too often forget to test @@ERROR in every needed place, causing many unhandled exceptions to go unnoticed.

In SQL Server 2005, TRY/CATCH represented a vast improvement over constantly testing @@ERROR, but RAISERROR has (until SQL Server 2012) remained as the only mechanism for generating your own errors. In SQL Server 2012, the new THROW statement (again, borrowed from throw in the .NET model) is the recommended alternative way to raise exceptions in your T-SQL code (although RAISERROR does retain several capabilities that THROW lacks, as I’ll explain shortly).

Re-Throwing Exceptions

The new THROW statement can be used in two ways. First, and as I just stated, it can serve as an alternative to RAISERROR, allowing your code to generate errors when it detects an unresolvable condition in processing. Used for this purpose, the THROW statement accepts parameters for the error code, description, and state, and works much like RAISERROR. A more specialized use of THROW takes no parameters, and can appear only inside a CATCH block. In this scenario, an unexpected error occurs in the TRY block above, triggering execution of the CATCH block. Inside the CATCH block, you can perform general error handling (for example, logging the error, or rolling back a transaction), and then issue a THROW statement with no parameters. This will re-throw the original error that occurred—with its code, message, severity, and state intact—back up to the client, so the error can be caught and handled at the application level as well. This is an easy and elegant way for you to implement a segmented exception handling strategy between the database and application layers.

In contrast, RAISERROR always raises a new error. Thus, it can only simulate re-throwing the original error by capturing the ERROR_MESSAGE, ERROR_SEVERITY, and ERROR_STATE in the CATCH block and using their values to raise a new error. Using THROW for this purpose is much more simple and direct, as demonstrated with the following code:

CREATE TABLE ErrorLog(ErrAt datetime2, Severity varchar(max), ErrMsg varchar(max))
GO

BEGIN TRY
  DECLARE @Number int = 5 / 0;
END TRY
BEGIN CATCH
  -- Log the error info, then re-throw it
  INSERT INTO ErrorLog VALUES(SYSDATETIME(), ERROR_SEVERITY(), ERROR_MESSAGE());
  THROW;
END CATCH

In this code’s CATCH block, error information is recorded to the ErrorLog table and then the original error (divide by zero) is re-thrown for the client to catch:

Msg 8134, Level 16, State 1, Line 4
Divide by zero error encountered.

To confirm that the error was logged by the CATCH block as expected before being re-thrown, query the ErrorLog table:

SELECT * FROM ErrorLog
ErrAt                     Severity ErrMsg
------------------------- -------- ------------------------------------------
2011-10-30 14:14:35.336   16       Divide by zero error encountered.

Comparing THROW and RAISERROR

The following table summarizes the notable differences between THROW and RAISERROR:

THROW RAISERROR
Can only generate user exceptions (unless re-throwing in CATCH block) Can generate user (>= 50000) and system (< 50000) exceptions
Supplies ad-hoc text; doesn’t utilize sys.messages Requires user messages defined in sys.messages (except for code 50000)
Doesn’t support token substitutions Supports token substitutions
Always uses severity level 16 (unless re-throwing in a CATCH block) Can set any severity level
Can re-throw original exception caught in the TRY block Always generates a new exception; the original exception is lost to the client
Error messages are buffered, and don’t appear in real-time Supports WITH NOWAIT to immediate flush buffered output on error

A user exception is an error with a code of 50000 or higher that you define for your application’s use. System exceptions are defined by SQL Server and have error codes lower than 50000. You can use the new THROW statement to generate and raise user exceptions, but not system exceptions. Only RAISERROR can be used to throw system exceptions. Note, however, that when THROW is used in a CATCH block to re-throw the exception from a TRY block, the actual original exception—even if it’s a system exception—will get thrown (as demonstrated in the previous “divide by zero” example).

When RAISERROR is used without an error code, SQL Server assigns an error code of 50000 and expects you to supply an ad-hoc message to associate with the error. The THROW statement always expects you to supply an ad-hoc message for the error, as well as a user error code of 50000 or higher. Thus, the following two statements are equivalent:

THROW 50000, 'An error occurred querying the table.', 1;
RAISERROR ('An error occurred querying the table.', 16, 1);

Both these statements raise an error with code 50000, severity 16, state 1, and the same message text. Compatibility between the two keywords ends there, however, as varying usages impose different rules (as summarized in Table 2-4). For example, only RAISERROR supports token substitution:

RAISERROR ('An error occurred querying the %s table.', 16, 1, 'Customer');
Msg 50000, Level 16, State 1, Line 22
An error occurred querying the Customer table.

THROW has no similar capability. Also, while RAISERROR lets you specify any severity level, THROW will always generate an error with a severity level of 16. This is significant, as level 11 and higher indicates more serious errors than level 10 and lower. For example:

RAISERROR ('An error occurred querying the table.', 10, 1);
An error occurred querying the table.

Because the severity is 10, this error does not echo the error code, level, state, and line number, and is displayed in black rather than the usual red that is used for severity levels higher than 10. In contrast, THROW cannot be used to signal a non-severe error.

The last important difference between the two keywords is the RAISERROR association with sys.messages. In particular, RAISERROR requires that you call sys.sp_addmessage to define error messages associated with user error codes higher than 50000. As explained, the RAISERROR syntax in our earlier examples uses an error code of 50000, and is the only supported syntax that lets you supply an ad-hoc message instead of utilizing sys.messages.

The following code demonstrates how to define customer user error messages for RAISERROR. First (and only once), a tokenized message for user error code 66666 is added to sys.messages. Thereafter, RAISERROR references the error by its code, and also supplies values for token replacements that are applied to the message’s text in sys.messages.

EXEC sys.sp_addmessage 66666, 16, 'There is already a %s named %s.';
RAISERROR(66666, 16, 1, 'cat', 'morris');
Msg 66666, Level 16, State 1, Line 34
There is already a cat named morris.

The THROW statement has no such requirement. You supply any ad-hoc message text with THROW. You don’t need to separately manage sys.messages, but this also means that THROW can’t (directly) leverage centrally managed error messages in sys.messages like RAISERROR does. Fortunately, the FORMATMESSAGE function provides a workaround, if you want to take advantage of the same capability with THROW. You just need to take care and make sure that the same error code is specified in the two places that you need to reference it (once for FORMATMESSAGE and once for THROW), as shown here:

DECLARE @Message varchar(max) = FORMATMESSAGE(66666, 'dog', 'snoopy');
THROW 66666, @Message, 1;
Msg 66666, Level 16, State 1, Line 40
There is already a dog named snoopy.

Summary

Starting in SQL Server 2012, the THROW keyword should be used instead of RAISERROR to raise your own errors. THROW can also be used inside CATCH blocks to raise the original error that occurred within the TRY block. However, RAISERROR is still supported, and can be used to raise system errors or errors with any lesser severity, when necessary. You can start working with THROW by downloading SQL Server 2012 “Denali” CTP3 from http://bit.ly/DenaliCTP3. Try not too have too much fun throwing errors! 😉

Book Announcement! Introducing “Programming Microsoft SQL Server 2012”

I’m very happy to announce that I’ll be authoring a new book on SQL Server 2012, code-named “Denali.” The full title, “Programming Microsoft SQL Server 2012” will technically be published by O’Reilly, yet it will be branded as a Microsoft Press book. O’Reilly recently acquired MS Press, but retained the Microsoft logo and black/red theme, which I’m really glad about. Although O’Reilly is a big name, the MS Press branding still carries more clout. And to be honest, I greatly prefer having a cool tool to a weird animal on the cover.

This book is an update to the SQL Server 2008 edition I wrote three years ago, with many similarities but also some very notable differences. I’m extremely pleased to once again team together with Andrew Brust, who will be writing the chapters on SQL CLR, column stores, BI, and SQL Azure. Andrew was my co-author for the 2008 edition and lead author of the original 2005 edition, so this is actually his third time around on this book. Both previous editions literally burst at the seams in their zeal to cover the entire SQL Server stack. This time around, in consideration of the many readily available BI-focused books, the editors decided to distill the BI footprint of the new Denali book (which accounted for roughly a whole third of the previous 1,000+ page 2008 edition) down to a single overview-style chapter. This simultaneously brings the page count down a little, while actually opening more space for expanded coverage of the relational database engine. As a result, this new book is more narrowly (and uniquely) focused on programming SQL Server for transactional line-of-business applications built with .NET and Visual Studio.

So busy days lie ahead. I’ve actually spent nearly a year already gearing up on Denali, since attending the Denali SDR in Redmond last October. Here are just some of the new topics planned for the 2012 edition:

  • SQL Server Data Tools
  • SQL Azure
  • Column store indexes
  • Sequences
  • Windowing function (OVER BY) improvements
  • Server-side paging
  • FileTable (FILESTREAM + hierarchyid = logical file system)
  • Metadata discovery
  • Spatial enhancements (curves, FULLGLOBE)
  • Contained Databases
  • Self-Service Reporting (Crescent)
  • Entity Framework & LINQ
  • WCF Data Services & WCF RIA Services

This list is by no means exhaustive, and may still change, but it does give a good idea of what to expect. Also, this is in addition to updated coverage from the previous edition that includes broader treatment of topics such as table-valued parameters, SQL Server Audit, and more.

With luck, the book should be out by Q2 2012. I’m looking forward to the work in store, and hope to produce the best piece of work I can. Along the way, I’ll be blogging more previews of what’s to come. So stay tuned, and thanks for reading.