Book Announcement! Windows Azure SQL Database – Step By Step

Just when I thought it was safe to go back in the water…

I’m very happy to announce that I’ll be authoring a new book on Windows Azure SQL Database. The book, part of the Microsoft Press “Step By Step” series, is designed for readers to quickly get productive with Windows Azure SQL Database — the cloud version of SQL Server.

I’m especially lucky and honored to work with my friend and collegue Brian Reynolds, who is co-authoring the book with me. Brian has great knowledge and experience with the Windows Azure platform, which is sure to shine through in his chapters. Plus, I’m once again delighted to work in partnership with Craig Branning and all the wonderful folks at Tallan, Inc.

So who is this book for? Well, anyone interested in quickly getting up and running with SQL Database on Windows Azure. This includes not only those experienced with SQL Server, but readers having general experience with other database technologies, and even those with little to no experience at all. The Step By Step series follows an inviting format that’s chock full of quick rewards — small bits of conceptual information are presented, and that information is then immediately put to practical use by walking through a relatively short procedure, one step at a time.

I’m also happy that the title will once again be published by O’Reilly Media and branded as an MS Press book. Years back, O’Reilly acquired MS Press, but retained the Microsoft logo with the black and red theme, which I think we all agree looks really, really cool.

So (once more!) busy days lie ahead. We’re still in the early stages, so some of this could easily change somewhat, but here’s what we have planned so far to cover:

  • Quick-Start, Setup, and Configuration
  • Security in the cloud
  • Reporting Services in the cloud
  • SQL Data Sync
  • Migration and Backup
  • Using the online management portal, and familiar tools like SSMS and SSDT
  • Programming using such tools as the Entity Framework ORM layer
  • Scalability, Federations, and Performance
  • Differences from on-premise SQL Server

With luck, the book should be out by Q3 2013. I’m looking forward to the work in store, and we hope to produce the best piece of work we can. Along the way, I’ll be blogging more previews of what’s to come. So stay tuned, and thanks for reading.

New Metadata Discovery Features in SQL Server 2012

It has always been possible to interrogate SQL Server for metadata (schema) information. You can easily discover all the objects in a database (tables, views, stored procedures, and so on) and their types by directly querying system tables (not recommended, as they can change from one version of SQL Server to another) or information schema views (which are consistent in each SQL Server version). It is significantly more challenging, however, to discover the result set schema for T-SQL statements or stored procedures that contain conditional logic. Using SET FMTONLY ON/OFF has been the common technique in the past for discovering the schema of a query’s result set without actually executing the query itself. For example, consider the following code:

USE AdventureWorks2012
GO

SET FMTONLY ON
SELECT * FROM HumanResources.Employee;
SET FMTONLY OFF

This SELECT statement, which would normally return all the rows from the HumanResources.Employee table, returns no rows at all. It just reveals the columns. The SET FMTONLY ON statement prevents queries from returning rows of data so that their schemas can be discovered, and this behavior remains in effect until SET FMTONLY OFF is encountered. SQL Server 2012 introduces several new system stored procedures and table-valued functions (TVFs) that provide significantly richer metadata discovery than what can be discerned using the relatively inelegant (and now deprecated) SET FMTONLY ON/OFF approach. These new procedures and functions are:

  • sys.sp_describe_first_result_set
  • sys.dm_exec_describe_first_result_set
  • sys.dm_exec_describe_first_result_set_for_object
  • sys.sp_describe_undeclared_parameters

In this blog post, I’ll explain how to use these new objects to discover schema information in SQL Server 2012.

sys.sp_describe_first_result_set

The sys.sp_describe_first_result_set stored procedure accepts a T-SQL statement and produces a highly detailed schema description of the first possible result set returned by that statement. The following code retrieves schema information for the same SELECT statement you used earlier to get information on all the columns in the HumanResources.Employee table:

EXEC sp_describe_first_result_set
 @tsql = N'SELECT * FROM HumanResources.Employee'

The following screenshot shows the wealth of information that SQL Server returns about each column in the result set returned by the sp_describe_first_result_set call:

sys.dm_exec_describe_first_result_set

There is also a data management function named sys.dm_exec_describe_first_result_set that works very similar to sys.sp_describe_first_result_set. But because it is implemented as a table-valued function (TVF), it is easy to query against it and limit the metadata returned. For example, the following query examines the same T-SQL statement, but returns just the name and data type of nullable columns:

SELECT name, system_type_name
 FROM sys.dm_exec_describe_first_result_set(
  'SELECT * FROM HumanResources.Employee', NULL, 1)
 WHERE is_nullable = 1

Here is the output:

name               system_type_name
-----------------  ----------------
OrganizationNode   hierarchyid
OrganizationLevel  smallint

Parameterized queries are also supported, if you supply an appropriate parameter signature after the T-SQL. The T-SQL in the previous example had no parameters, so it passed NULL for the “parameters parameter.” The following example discovers the schema of a parameterized query.

SELECT name, system_type_name, is_hidden
 FROM sys.dm_exec_describe_first_result_set('
  SELECT OrderDate, TotalDue
   FROM Sales.SalesOrderHeader
   WHERE SalesOrderID = @OrderID',
  '@OrderID int', 1)

Here is the output:

name             system_type_name  is_hidden
---------------  ----------------  ---------
OrderDate        datetime          0
TotalDue         money             0
SalesOrderID     int               1

You’d be quick to question why the SalesOrderID column is returned for a SELECT statement that returns only OrderDate and TotalDue. The answer lies in the last parameter passed to the data management function. A bit value of 1 (for true) tells SQL Server to return the identifying SalesOrderID column, because it is used to “browse” the result set. Notice that it is marked true (1) for is_hidden. This informs the client that the SalesOrderID column is not actually revealed by the query, but can be used to uniquely identify each row in the query’s result set.

What if multiple result sets are possible? There’s no problem with this as long as they all have the same schema. In fact, SQL Server will even try to forgive cases where multiple possible schemas are not exactly identical. For example, if the same column is nullable in one result set and non-nullable in the other, schema discovery will succeed and indicate the column as nullable. It will even tolerate cases where the same column has a different name (but same type) between two possible result sets, and indicate NULL for the column name, rather than arbitrarily choosing one of the possible column names or failing altogether.

The following code demonstrates this with a T-SQL statement that has two possible result sets depending on the value passed in for the @SortOrder parameter. Because both result sets have compatible schemas, the data management function succeeds in returning schema information.

SELECT name, system_type_name
 FROM sys.dm_exec_describe_first_result_set('
    IF @SortOrder = 1
      SELECT OrderDate, TotalDue
       FROM Sales.SalesOrderHeader
       ORDER BY SalesOrderID ASC
    ELSE IF @SortOrder = -1
      SELECT OrderDate, TotalDue
       FROM Sales.SalesOrderHeader
       ORDER BY SalesOrderID DESC',
   '@SortOrder AS tinyint', 0) 

Here is the output:

name         system_type_name
-----------  ----------------
OrderDate    datetime
TotalDue     money

Discovery won’t succeed if SQL Server detects incompatible schemas. In this next example, the call to the system stored procedure specifies a T-SQL statement with two possible result sets, but one returns three columns while the other returns only two columns.

EXEC sys.sp_describe_first_result_set
  @tsql = N'
    IF @IncludeCurrencyRate = 1
      SELECT OrderDate, TotalDue, CurrencyRateID
       FROM Sales.SalesOrderHeader
    ELSE
      SELECT OrderDate, TotalDue
       FROM Sales.SalesOrderHeader'

In this case, the system stored procedure raises an error that clearly explains the problem:

Msg 11509, Level 16, State 1, Procedure sp_describe_first_result_set, Line 53

The metadata could not be determined because the statement 'SELECT OrderDate, TotalDue, CurrencyRateID FROM Sales.SalesOrderHeader' is not compatible with the statement 'SELECT OrderDate, TotalDue FROM Sales.SalesOrderHeader'.

It is noteworthy to mention that the data management function copes with this scenario much more passively. Given conflicting result set schemas, it simply returns NULL and does not raise an error.

sys.dm_exec_describe_first_result_set_for_object

The data management function sys.dm_exec_describe_first_result_set_for_object can be used to achieve the same discovery against any object in the database. It accepts just an object ID and the Boolean “browse” flag to specify if hidden ID columns should be returned. You can use the OBJECT_ID function to obtain the ID of the desired object. The following code demonstrates this by returning schema information for the stored procedure GetOrderInfo.

CREATE PROCEDURE GetOrderInfo(@OrderID AS int) AS
  SELECT OrderDate, TotalDue
   FROM Sales.SalesOrderHeader
   WHERE SalesOrderID = @OrderID
GO

SELECT name, system_type_name, is_hidden
 FROM sys.dm_exec_describe_first_result_set_for_object(OBJECT_ID('GetOrderInfo'), 1)

Here is the output:

name             system_type_name   is_hidden
---------------  -----------------  ---------
OrderDate        datetime           0
TotalDue         money              0
SalesOrderID     int                1

sys.sp_describe_undeclared_parameters

Finally, the sys.sp_describe_undeclared_parameters stored procedure parses a T-SQL statement to discover type information about the parameters expected by the statement, as the following code demonstrates:

EXEC sys.sp_describe_undeclared_parameters
 N'IF @IsFlag = 1 SELECT 1 ELSE SELECT 0'

Here is the output:

parameter_ordinal name    suggested_system_type_id suggested_system_type_name ...
----------------- ------- ------------------------ -------------------------- -------
1                 @IsFlag 56                       int                        ... 

In this example, SQL Server detects the @IsFlag parameter, and suggests the int data type based on the usage in the T-SQL statement it was given to parse.

Download Visual Studio Live! New York Slides and Code

A very successful Visual Studio Live! just wrapped up two weeks ago here in my hometown of Brooklyn, NY. Thanks again to all the good folks that attended my sessions.

As promised, I’ve posted all the slides and code from my sessions for you to download. You can grab the stuff here:

May 14, 2012 SQL Server Workshop for Developers http://sdrv.ms/VSLiveNY2012SQL
May 17, 2012 Introducing SQL Server Data Tools http://sdrv.ms/VSLiveNY2012SSDT
May 17, 2012 T-SQL Enhancements in SQL Server 2012 http://sdrv.ms/VSLiveNY2012TSql

Looking forward to Visual Studio Live! in Redmond this coming August! :)

Enhance Portability with Partially Contained Databases in SQL Server 2012

The dependency of database-specific users upon server-based logins poses a challenge when you need to move or restore a database to another server. Although the users move with the database, their associated logins do not, and thus the relocated database will not function properly until you also setup and map the necessary logins on the target server. To resolve these types of dependency problems and help make databases more easily portable, SQL Server 2012 introduces “partially contained” databases.

The term “partially contained” is based on the fact that SQL Server itself merely enables containment—it does not enforce it. It’s still your job to actually implement true containment. From a security perspective, this means that partially contained databases allow you to create a special type of user called a contained user. The contained user’s password is stored right inside the contained database, rather than being associated with a login defined at the server instance level and stored in the master database. Then, unlike the standard SQL Server authentication model, contained users are authenticated directly against the credentials in the contained database without ever authenticating against the server instance. Naturally, for this to work, a connection string with a contained user’s credentials must include the Initial Catalog keyword that specifies the contained database name.

Creating a Partially Contained Database

To create a partially contained database, first enable the contained database authentication setting by calling sp_configure and then issue a CREATE DATABASE statement with the new CONTAINMENT=PARTIAL clause as the following code demonstrates.

-- Enable database containment

USE master
GO

EXEC sp_configure 'contained database authentication', 1
RECONFIGURE

-- Delete database if it already exists
IF EXISTS(SELECT name FROM sys.databases WHERE name = 'MyDB')
 DROP DATABASE MyDB
GO

-- Create a partially contained database
CREATE DATABASE MyDB CONTAINMENT = PARTIAL
GO

USE MyDB
GO

To reiterate, SQL Server doesn’t enforce containment. You can still break containment by creating ordinary database users for server-based logins. For this reason, it’s easy to convert an ordinary (uncontained) database to a partially contained database; simply issue an ALTER DATABASE statement and specify SET CONTAINMENT=PARTIAL. You’ll then be able to migrate the server-based logins to contained logins and achieve server independence.

Creating a Contained User

Once you have a contained database, you can create a contained user for it by issuing a CREATE USER statement and specifying WITH PASSWORD, as shown here:

CREATE USER UserWithPw
 WITH PASSWORD = N'password$1234'

This syntax is valid only for contained databases; SQL Server will raise an error if you attempt to create a contained user in the context of an uncontained database.

That’s all there is to creating partially contained databases with contained users. The only remaining point that’s worth calling out is that an Initial Catalog clause pointing to a partially contained database must be specified explicitly in a connection string that also specifies the credentials of a contained user in that database. If just the credentials are specified without the database, SQL Server will not scan the partially contained databases hosted on the instance for one that has a user with matching credentials. Instead, it will consider the credentials to be those of an ordinary SQL Server login, and will not authenticate against the contained database.

Other Partially Contained Database Features

Aside from server-based logins, there are many other dependencies that a database might have on its hosted instance. These include linked servers, SQL CLR, database mail, service broker objects, endpoints, replication, SQL Server Agent jobs, and tempdb collation. All these objects are considered to be uncontained entities since they all exist outside the database.

Uncontained entities threaten a database’s portability. Since these objects are all defined at the server instance level, behavior can vary unpredictably when databases are shuffled around from one instance to another. Let’s examine features to help you achieve the level of containment and stability that your circumstances require.

Uncontained Entities View

SQL Server provides a new data management view (DMV) called sys.dm_db_uncontained_entities that you can query on to discover potential threats to database portability. This DMV not only highlights dependent objects, it will even report the exact location of all uncontained entity references inside of stored procedures, views, functions, and triggers.

The following code creates a few stored procedures, and then joins sys.dm_db_uncontained_entities with sys.objects to report the name of all objects having uncontained entity references in them.

-- Create a procedure that references a database-level object
CREATE PROCEDURE GetTables AS
BEGIN
  SELECT * FROM sys.tables
END
GO

-- Create a procedure that references an instance-level object
CREATE PROCEDURE GetEndpoints AS
BEGIN
  SELECT * FROM sys.endpoints
END
GO

-- Identify objects that break containment
SELECT
  UncType = ue.feature_type_name,
  UncName = ue.feature_name,
  RefType = o.type_desc,
  RefName = o.name,
  Stmt = ue.statement_type,
  Line = ue.statement_line_number,
  StartPos = ue.statement_offset_begin,
  EndPos = ue.statement_offset_end
 FROM
  sys.dm_db_uncontained_entities AS ue
  INNER JOIN sys.objects AS o ON o.object_id = ue.major_id

Here is the result of the query:

UncType      UncName    RefType               RefName       Stmt    Line  StartPos  EndPos
-----------  ---------  --------------------  ------------  ------  ----  --------  ---
System View  endpoints  SQL_STORED_PROCEDURE  GetEndpoints  SELECT  5     218       274

The DMV identifies the stored procedure GetEndpoints as an object with an uncontained entity reference. Specifically, the output reveals that a stored procedure references the sys.endpoints view in a SELECT statement on line 5 at position 218. This alerts you to a database dependency on endpoints configured at the server instance level that could potentially pose an issue for portability. The GetTables stored procedure does not have any uncontained entity references (sys.tables is contained), and is therefore not reported by the DMV.

Collations and tempdb

Ordinarily, all databases hosted on the same SQL Server instance share the same tempdb database for storing temporary tables, and all the databases (including tempdb) on the instance use the same collation setting (collation controls string data character set, case sensitivity, and accent sensitivity). When joining between regular database tables and temporary tables, both your user database and tempdb must use a compatible collation. This, again, represents an instance-level dependency with respect to the fact that the collation setting can vary from one server instance to another. Thus, problems arise when moving databases between servers that have different collation settings for tempdb. The code below demonstrates the problem, and how to avoid it by using a contained database.

-- Create an uncontained database with custom collation
USE master
GO
IF EXISTS(SELECT name FROM sys.databases WHERE name = 'MyDB')
 DROP DATABASE MyDB
GO
CREATE DATABASE MyDB COLLATE Chinese_Simplified_Pinyin_100_CI_AS
GO

USE MyDB
GO

-- Create a table in MyDB (uses Chinese_Simplified_Pinyin_100_CI_AS collation)
CREATE TABLE TestTable (TextValue nvarchar(max))

-- Create a temp table in tempdb (uses SQL_Latin1_General_CP1_CI_AS collation)
CREATE TABLE #TempTable (TextValue nvarchar(max))

-- Fails, because MyDB and tempdb uses different collation
SELECT *
 FROM TestTable INNER JOIN #TempTable ON TestTable.TextValue = #TempTable.TextValue

-- Convert to a partially contained database
DROP TABLE #TempTable
USE master

ALTER DATABASE MyDB SET CONTAINMENT=PARTIAL
GO

USE MyDB
GO

-- Create a temp table in MyDB (uses Chinese_Simplified_Pinyin_100_CI_AS collation)
CREATE TABLE #TempTable (TextValue nvarchar(max))

-- Succeeds, because the table in tempdb now uses the same collation as MyDB
SELECT *
 FROM TestTable INNER JOIN #TempTable ON TestTable.TextValue = #TempTable.TextValue

-- Cleanup
DROP TABLE #TempTable
USE master
DROP DATABASE MyDB
GO

This code first creates an uncontained database that uses Chinese_Simplified_Pinyin_100_CI_AS collation on a server instance that uses (the default) SQL_Latin1_General_CP1_CI_AS collation. The code then creates a temporary table and then attempts to join an ordinary database table against it. The attempt fails because the two tables have different collations (that is, they each reside in databases that use different collations), and SQL Server issues the following error message:

Msg 468, Level 16, State 9, Line 81
Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and
"Chinese_Simplified_Pinyin_100_CI_AS" in the equal to operation.

Then the code issues an ALTER DATABASE…SET CONTAINMENT=PARTIAL statement to convert the database to a partially contained database. As a result, SQL Server resolves the conflict by collating the temporary table in tempdb in the same collation as the contained database, and the second join attempt succeeds.

Summary

Partially contained databases in SQL Server 2012 go a long way helping to improve the portability of databases across servers and instances. In this blog post, I demonstrated how to create a partially contained database with contained users, how to deal with collation issues, and how to use the new data management view to discover threats to containment and identify external dependencies. These capabilities are welcome news for SQL Server DBAs everywhere. Enjoy!

Download Visual Studio Live Las Vegas and SQLBits London Slides and Code

I just returned from Visual Studio Live (Las Vegas) and SQLBits (London), and both shows were very successful. Thanks again to all the good folks that attended my sessions, your positive energy was amazing and I hope you had as much fun as I did.

As promised, I’ve posted the slides and code from all my sessions for you to download. You can grab the stuff here:

Visual Studio Live (Las Vegas)

Mar 26, 2012 SQL Server Workshop for Developers http://sdrv.ms/VSLiveVegas2012SQL
Mar 27, 2012 Introducing SQL Server Data Tools http://sdrv.ms/VSLiveVegas2012SSDT
Mar 27, 2012 Understanding Your .NET Data Access Choices http://sdrv.ms/VSLiveVegas2012DataAccess

SQLBits (London)

Mar 29, 2012 SQL Server Workshop for Developers http://sdrv.ms/SQLBits2012WorkshopForDevelopers
Mar 30, 2012 Native File Streaming http://sdrv.ms/SQLBits2012NativeFileStreaming

Looking forward to Visual Studio Live (New York) next month, to be held right here in my hometown Brooklyn, NY! :)

 

New Spatial Features in SQL Server 2012

SQL Server 2012 adds many significant improvements to the spatial support that was first introduced with SQL Server 2008. Among the more notable enhancements is support for curves (arcs), where SQL Server 2008 only supported straight lines, or polygons composed of straight lines. Microsoft also provides methods that test for non-2012-compatible (curved) shapes, and convert circular data to line data for backward compatibility with SQL Server 2008 (as well as other mapping platforms that don’t support curves).

New Spatial Data Classes

The three new spatial data classes in SQL Server 2012 are:

  • Circular strings
  • Compound curves
  • Curve polygons

All three of these shapes are supported in WKT, WKB, and GML by both the geometry and geography data types, and all of the existing methods work on all of the new circular shapes. My previous post, Geospatial Support for Circular Data in SQL Server 2012 covers these new spatial classes in detail, and shows you how to use them to create circular data. This post focuses on additional spatial features that are new in SQL Server 2012.

New Spatial Methods

Let’s explore a few of the new spatial methods. Some of these new methods complement the new curved shapes, while others add new spatial features that work with all shapes.

The STNumCurves and STCurveN Methods

These two methods can be invoked on any geometry or geography instance. They can be used together to discover information about the curves contained within the spatial instance. The STNumCurves method returns the total number of curves in the instance. You can then pass any number between 1 and what STNumCurves returns to extract each individual curve, and thus iterate all the curves in the instance.

For example, the WKT string CIRCULARSTRING(0 4, 4 0, 8 4, 4 8, 0 4) defines a perfect circle composed of two connected segments; 0 4, 4 0, 8, 4 and 8 4, 4 8, 0 4 (the third coordinate 8 4 is used both as the ending point of the first arc and the starting point of the second arc. The following code demonstrates how to obtain curve information from this circular string using the STNumCurves and STCurveN methods.

-- Create a full circle shape (two connected semi-circles)
DECLARE @C geometry = 'CIRCULARSTRING(0 4, 4 0, 8 4, 4 8, 0 4)'

-- Get the curve count (2) and the 1st curve (bottom semi-circle)
SELECT
  CurveCount = @C.STNumCurves(),
  SecondCurve = @C.STCurveN(2),
  SecondCurveWKT = @C.STCurveN(2).ToString()

This query produces the following output:

CurveCount SecondCurve                                     SecondCurveWKT
---------- ----------------------------------------------- -------------------------------
2          0x000000000204030000000000000000002040000000... CIRCULARSTRING (8 4, 4 8, 0 4)

You can see that STNumCurves indicates there are two curves, and that STCurveN(2) returns the second curve. If you view the results in the spatial viewer, you’ll see just the top half of the circle. This is the semi-circle defined by the second curve, which is converted back to WKT as CIRCULARSTRING (8 4, 4 8, 0 4). Notice that this represents the second segment of the full circle.

The BufferWithCurves Method

SQL Server 2008 introduced the STBuffer method which “pads” a line, effectively converting it into a polygon. If you look closely at the resulting polygon shapes in the spatial viewer, it appears that the points of each line string (including the mid points) are transformed into rounded edges in the polygon. However, the rounded edge look is actually produced by plotting many short straight lines that are clustered very closely together, presenting the illusion of a curve. This approach is necessary since curves were not previously supported before SQL Server 2012 (but the STBuffer method was).

Clearly, using native curve definitions in a curve polygon is more efficient than clustering a multitude of straight lines in an ordinary polygon. For backward compatibility, STBuffer continues to return the (inefficient) polygon as before. So SQL Server 2012 introduces a new method, BufferWithCurves, for this purpose. The following code uses BufferWithCurves to pad lines using true curves, and compares the result with its straight-line cousin, STBuffer.

DECLARE @streets geometry = '
 GEOMETRYCOLLECTION(
  LINESTRING (100 -100, 20 -180, 180 -180),
  LINESTRING (300 -300, 300 -150, 50 -50)
 )'
SELECT @streets.BufferWithCurves(10)

SELECT
  AsWKT = @streets.ToString(),
  Bytes = DATALENGTH(@streets),
  Points = @streets.STNumPoints()
 UNION ALL
 SELECT
  @streets.STBuffer(10).ToString(),
  DATALENGTH(@streets.STBuffer(10)),
  @streets.STBuffer(10).STNumPoints()
 UNION ALL
 SELECT
  @streets.BufferWithCurves(10).ToString(),
  DATALENGTH(@streets.BufferWithCurves(10)),
  @streets.BufferWithCurves(10).STNumPoints()

Here is the resulting shape returned by the first SELECT statement (the collection of padded line shapes generated by BufferWithCurves):

As with STBuffer, the new shapes have rounded edges around the points of the original line strings. However, BufferWithCurves generates actual curves, and thus, produces a significantly smaller and simpler polygon. The second SELECT statement demonstrates by comparing the three shapes—the original line string collection, the polygon returned by STBuffer, and the curve polygon returned by BufferWithCurves. Here are the results:

AsWKT                                                                       Bytes  Points
--------------------------------------------------------------------------  -----  ------
GEOMETRYCOLLECTION (LINESTRING (100 -100, 20 -180, 180 -180), LINESTRIN...  151    6
MULTIPOLYGON (((20.000000000000796 -189.99999999999858, 179.99999999999...  5207   322
GEOMETRYCOLLECTION (CURVEPOLYGON (COMPOUNDCURVE ((20.000000000000796 -1...  693    38

The first shape is the original geometry collection of line strings used for input, which requires only 151 bytes of storage, and has only 6 points. For the second shape, STBuffer pads the line strings to produce a multi-polygon (a set of polygons) that consumes 5,207 bytes and has a total of 322 points—a whopping 3,448 percent increase from the original line strings. In the third shape, BufferWithCurves is used to produce the equivalent padding using a collection of curve polygons composed of compound curves, so it consumes only 693 bytes and has only 38 points—a (relatively) mere 458 percent increase from the original line strings.

The ShortestLineTo Method

This new method examines any two shapes and figures out the shortest line between them. The following code demonstrates:

DECLARE @Shape1 geometry = 'POLYGON ((-20 -30, -3 -26, 14 -28, 20 -40, -20 -30))'
DECLARE @Shape2 geometry = 'POLYGON ((-18 -20, 0 -10, 4 -12, 10 -20, 2 -22, -18 -20))'

SELECT @Shape1
UNION ALL
SELECT @Shape2
UNION ALL
SELECT @Shape1.ShortestLineTo(@Shape2).STBuffer(.25)

This code defines two polygons and then uses ShortestLineTo to determine, generate, and return the shortest straight line that connects them. STBuffer is also used to pad the line string so that it is more clearly visible in the spatial viewer:

The MinDbCompatibilityLevel Method

With the added support for curves in SQL Server 2012 comes support for backward compatibility with previous versions of SQL Server (2008 and 2008 R2) that don’t support curves. The new MinDbCompatibilityLevel method accepts any WKT string and returns the minimum version of SQL Server required to support the shape defined by that string. For example, consider the following code:

DECLARE @Shape1 geometry = 'CIRCULARSTRING(0 50, 90 50, 180 50)'
DECLARE @Shape2 geometry = 'LINESTRING (0 50, 90 50, 180 50)'

SELECT
 Shape1MinVersion = @Shape1.MinDbCompatibilityLevel(),
 Shape2MinVersion = @Shape2.MinDbCompatibilityLevel()

The MinDbCompatibilityLevel method returns 110 (referring to version 11.0) for the first WKT string and 100 (version 10.0) for the second one. This is because the first WKT string contains a circular string, which requires SQL Server 2012 (version 11.0), while the line string in the second WKT string is supported by SQL Server 2008 (version 10.0) and higher.

The STCurveToLine and CurveToLineWithTolerance Methods

These are two methods you can use to convert curves to roughly equivalent straight line shapes. Again, this is to provide compatibility with previous versions of SQL Server and other mapping platforms that don’t support curves.

The STCurveToLine method converts a single curve to a line string with a multitude of segments and points that best approximate the original curve. The technique is similar to what we just discussed for STBuffer, where many short straight lines are connected in a cluster of points to simulate a curve. And, as explained in that discussion, the resulting line string requires significantly more storage than the original curve. To offer a compromise between fidelity and storage, the CurveToLineWithTolerance method accepts “tolerance” parameters to produce line strings that consume less storage space than those produced by STCurveToLine. The following code demonstrates by using both methods to convert the same circle shape from the previous STNumCurves and STCurveN example into line strings.

-- Create a full circle shape (two connected semi-circles)
DECLARE @C geometry = 'CIRCULARSTRING(0 4, 4 0, 8 4, 4 8, 0 4)'

-- Render as curved shape
SELECT
  Shape = @C,
  ShapeWKT = @C.ToString(),
  ShapeLen = DATALENGTH(@C),
  Points = @C.STNumPoints()

-- Convert to lines (much larger, many more points)
SELECT
  Shape = @C.STCurveToLine(),
  ShapeWKT = @C.STCurveToLine().ToString(),
  ShapeLen = DATALENGTH(@C.STCurveToLine()),
  Points = @C.STCurveToLine().STNumPoints()

-- Convert to lines with tolerance (not as much larger, not as many more points)
SELECT
  Shape = @C.CurveToLineWithTolerance(0.1, 0),
  ShapeWKT = @C.CurveToLineWithTolerance(0.1, 0).ToString(),
  ShapeLen = DATALENGTH(@C.CurveToLineWithTolerance(0.1, 0)),
  Points = @C.CurveToLineWithTolerance(0.1, 0).STNumPoints()

The query results show that the original circle consumes only 112 bytes and has 5 points. Invoking STCurveToLine on the circle converts it into a line string that consumes 1,072 bytes and has 65 points. That’s a big increase, but the resulting line string represents the original circle in high fidelity; you will not see a perceptible difference in the two when viewing them using the spatial viewer. However, the line string produced by CurveToLineWithTolerance consumes only 304 bytes and has only 17 points; a significantly smaller footprint, paid for with a noticeable loss in fidelity. As shown by the spatial viewer results below, using CurveToLineWithTolerance produces a circle made up of visibly straight line segments:

The STIsValid, IsValidDetailed and MakeValid Methods

Spatial instance validation has improved greatly in SQL Server 2012. The STIsValid method evaluates a spatial instance and returns a 1 (for true) or 0 (for false) indicating if the instance represents a valid shape (or shapes). If the instance is invalid, the new IsValidDetailed method will return a string explaining the reason why. The following code demonstrates.

DECLARE @line geometry = 'LINESTRING(1 1, 2 2, 3 2, 2 2)'

SELECT
 IsValid = @line.STIsValid(),
 Details = @line.IsValidDetailed()

This line string is invalid because the same point (2 2) is repeated, which results in “overlapping edges,” as revealed by the output from IsValidDetailed:

IsValid  Details
-------  -------------------------------------------------------------------
0        24413: Not valid because of two overlapping edges in curve (1).

SQL Server 2012 is more tolerant of invalid spatial instances than previous versions. For example, you can now perform metric operations (such as STLength) on invalid instances, although you still won’t be able to perform other operations (such as STBuffer) on them.

The new MakeValid method can “fix” an invalid spatial instance and make it valid. Of course, the shape will shift slightly, and there are no guarantees on the accuracy or precision of the changes made. The code in Listing 10-27 uses MakeValid to remove overlapping parts (which can be caused by anomalies such as inaccurate GPS traces), effectively converting the invalid line string into a valid spatial instance.

DECLARE @line geometry = 'LINESTRING(1 1, 2 2, 3 2, 2 2)'
SELECT @line.MakeValid().ToString() AS Fixed

The WKT string returned by the SELECT statement shows the “fixed” line string:

Fixed
-------------------------------------------------------------------
LINESTRING (3 2, 2 2, 1.0000000000000071 1.0000000000000036)

Other Enhancements

The remainder of this post gives brief mention to several other noteworthy spatial enhancements added in SQL Server 2012. These include better geography support, and precision and optimization improvements.

Support for geography Instances Exceeding a Logical Hemisphere

Previous versions of SQL Server supported geography objects as large as (slightly less than) a logical hemisphere (half the globe). This limitation has been removed in SQL Server 2012, which now supports geography instances of any size (even the entire planet).

When you define a geography polygon, the order in which you specify the ring’s latitude and longitude coordinates (known as vertex order) is significant (unlike geometry, where vertex order is insignificant). The coordinate points are always defined according to the left-foot inside rule; when you “walk” the boundary of the polygon, your left foot is on the inside. Thus, vertex order determines whether you are defining a small piece of the globe, relative to the larger piece defined by the entire globe except for the small piece (that is, the rest of the globe).

Since previous versions of SQL Server were limited to half the globe, it was impossible to specify the points of a polygon in the “wrong order,” simply because doing so resulted in too large a shape (and thus, raised an error). That error potential no longer exists in SQL Server 2012, so it’s even more critical to make sure your vertex order is correct, or you’ll be unwittingly working with the exact “opposite” shape.

If you have a geography instance that is known have the wrong vertex order, you can repair it using the new ReorientObject method. This method operates only on polygons (it has no effect on points, line strings, or curves), and can be used to correct the ring orientation (vertex order) of the polygon. The following code demonstrates.

-- Small (less than a logical hemisphere) polygon
SELECT geography::Parse('POLYGON((-10 -10, 10 -10, 10 10, -10 10, -10 -10))')

-- Reorder in the opposite direction for "rest of the globe"
SELECT geography::Parse('POLYGON((-10 -10, -10 10, 10 10, 10 -10, -10 -10))')

-- Reorient back to the small polygon
SELECT geography::Parse('POLYGON((-10 -10, -10 10, 10 10, 10 -10, -10 -10))').ReorientObject()

Three geography polygon instances are defined in this code. The first geography instance defines a very small polygon. The second instance uses the exact same coordinates, but because the vertex order reversed, it defines an enormous polygon whose area represents the entire globe except for the small polygon. As explained, such a definition would cause an error in previous versions of SQL Server, but is now accommodated without a problem by SQL Server 2012. The third instance reverses the vertex order on the same shape as the second instance, thereby producing the same small polygon as the first instance.

Full Globe Support

Along with the aforementioned support for geography instances to exceed a single logical hemisphere comes a new spatial data class called FULLGLOBE. As you may have guessed, this is a shape that represents the entire planet. If you’ve ever wondered how many square meters there are in the entire world, the following query gives you the answer (which is 510,065,621,710,996 square meters, so you can stop wondering).

-- Construct a new FullGlobe object (a WGS84 ellipsoid)
DECLARE @Earth geography = 'FULLGLOBE'

-- Calculate the area of the earth
SELECT PlanetArea = @Earth.STArea()

All of the common spatial methods work as expected on a full globe object. So you could, for example, “cut away” at the globe by invoking the STDifference and STSymDifference method against it using other polygons as cookie-cutter shapes.

New “Unit Sphere” Spatial Reference ID

The default spatial reference ID (SRID) in SQL Server 2012 is 4326, which uses the metric system as its unit of measurement. This SRID also represents the true ellipsoidal sphere shape of the earth. While this representation is most accurate, it’s also more complex to calculate precise ellipsoidal mathematics. SQL Server 2012 offers a compromise in speed and accuracy, by adding a new spatial reference id (SRID), 104001, which uses a sphere of radius 1 to represent a perfectly round earth.

You can create geography instances with SRID 104001 when you don’t require the greatest accuracy. The STDistance, STLength, and ShortestLineTo methods are optimized to run faster on the unit sphere, since it takes a relatively simple formula to compute measures against a perfectly round sphere (compared to an ellipsoidal sphere).

Better Precision

Internal spatial calculations in SQL Server 2012 are now performed with 48 bits of precision, compared to 27 bits used in SQL Server 2008 and SQL Server 2008 R2. This can reduce the error caused by rounding of floating point coordinates for original vertex points by the internal computation.

Summary

This blog post introduced you to some of the powerful new spatial capabilities added to SQL Server 2012. You saw how to use STNumCurves and STCurveN to obtain curve information from circular data, the BufferWithCurves method to produce more efficient padded line shapes than STBuffer, and the ShortestLineTo method to figure out the shortest distance between two shapes. Then you saw how to use the new MinDbCompatibilityLevel, STCurveToLine, and CurveToLineWithTolerance methods for supporting backward compatibility with SQL Server 2008. You also learned how SQL Server 2012 is much better at handling invalid spatial data, using the STIsValid, IsValidDetailed, and MakeValid methods. Finally, you learned about the new full globe support, unit sphere SRID, and improved precision.

You can learn much more about spatial functionality in my new book Programming Microsoft SQL Server 2012, which has an entire chapter dedicated to the topic. I hope you get to enjoy these powerful new spatial capabilities in SQL Server 2012!

WCF Data Services vs. WCF RIA Services – Making the Right Choice

Windows Communication Foundation (WCF) provides all the support you need to build distributed service-oriented data access solutions. You can certainly work with WCF directly to create custom services and expose data from an Entity Data Model (EDM) with the ADO.NET Entity Framework, or from any other data access layer (DAL) of your choosing. To take this “raw” approach, you need to start with the basics, or what is commonly referred to as the ABC’s of WCF: Addresses, Bindings, and Contracts. You must create service, operation, and data contracts, and then configure your service model with appropriate endpoint addresses and compatible bindings to be reachable by clients. Services are usually stateless, so you must also handle client-side change tracking and multi-user conflict resolution entirely on your own. The learning curve can be quite steep, after which you will still need to expend a great deal of effort to make it work.

Alternatively, you can turn to one of the two later technologies that Microsoft has built on top of WCF. These are WCF Data Services and WCF RIA Services, and they represent two very different approaches for building data-oriented services. Both provide abstractions that shield you from many underlying WCF particulars, so you get to spend more time focusing on your application and less time on plumbing. For one thing, you don’t need to code WCF contracts or manage change tracking on the client; all that gets done for you. With WCF RIA Services (and Silverlight), you don’t even need to create and update service references; Visual Studio generates code automatically via a special link that keeps your client and WCF RIA Services projects in sync at all times.

Both WCF Data Services and WCF RIA Services can solve many of the same problems, so it is only natural to question which one to use. The answer extends a bit beyond the standard “it depends on your scenario” response, since WCF RIA Services offers a lot more than just data access functionality. It also features client-side self-tracking entities, client-side validation, automatic server-to-client code generation, and more. In this blog post, I’ll discuss both platforms at a high level to help guide you in making the right choice.

WCF Data Services

Microsoft designed WCF Data Services as a thin layer over Entity Framework that exposes data-centric services to client applications. Out of the box, you can quickly build WCF Data Services over an Entity Data Model with virtually no effort. Custom providers for WCF Data Services for data sources other than Entity Framework are available; however, considerable additional effort is required to implement them. You can think of WCF Data Services as universal Web Services built just for data, although it can be easily extended with custom service operation methods. The platform is based on the industry standards of Representational State Transfer (REST) and the Open Data Protocol (OData), which means that these services are consumable by virtually every type of client in the world.

REST provides a uniform interface for querying and updating data. It is based on HTTP, meaning that client requests are issued in the form of GET, POST, MERGE, and DELETE actions — standard verbs understood by all HTTP clients. Any REST query can be invoked with an HTTP GET request by expressing all the elements of the query in a properly formed Uniform Resource Identifier (a URI, which is a more general term than Uniform Resource Locator [URL]). You can even test the service with an ordinary browser; simply type the properly formed URI directly into the address bar and you will receive the Atom Publishing Protocol (AtomPub) response (an XML dialect very similar to the Really Simply Syndication [RSS] feed format). The POST, MERGE, and DELETE verbs correspond respectively to insert, update, and delete operations supported by the service, and the payload (parameters, data, and other metadata) for these operations is passed in HTTP headers.

Just as REST enables universal data access via HTTP, the Open Data Protocol (OData) establishes universal data structure via standard serialization formats (see http://www.odata.org). All clients can handle plain text formats such as JavaScript Object Notation (JSON) and of course XML, and so OData defines standard response formats based on both formats. JSON provides a compact structure suitable for many basic types of services, while XML forms the basis for the more verbose AtomPub feed format. AtomPub is the default serialization format in WCF Data Services, because it effectively leverages the hierarchical nature of XML to describe the rich structure of data and metadata in an Entity Data Model.

The WCF Data Services client libraries for Windows/WPF, ASP.NET, Silverlight, and Windows Phone 7 include a special LINQ provider, commonly known as LINQ to REST. This provider automatically translates client-side LINQ queries into an equivalent OData URI, meaning that you don’t need to learn the OData URI syntax if you are building Microsoft clients over WCF Data Services (just use good old LINQ). The client libraries automatically deserialize the AtomPub feed response from WCF Data Services into ready-to-use objects. They also provide a stateful context object that can track changes on the client for pushing updates back to the server, although your code needs to explicitly notify the context about objects as they are changed.

WCF RIA Services

WCF RIA Services is, well, richer than WCF Data Services (and also newer). Indeed, the R in RIA means rich, although the full TLA (Three-Letter Acronym) can stand for Rich Internet Application or Rich Interactive Application—depending on who you’re talking to. Since its earliest days, WCF RIA Services was designed to work best with Silverlight, although it now also supports OData, SOAP, and JSON to reach a wider range of clients as well. You can build WCF RIA Services over any data access layer, including Entity Framework and LINQ to SQL. You can also use Plain Old CLR Objects (POCOs), in which case you handle the persistence yourself using your data access technique of choice, even conventional ADO.NET (raw readers and command objects, or DataSets). In any case, you expose WCF RIA Services by coding domain service classes that support CRUD operations, as well as other custom service operations. You also maintain a special metadata class for each entity that auto-magically surfaces on the client for effortless end-to-end validation across the wire.

When WCF RIA Services is used with Silverlight, you don’t need to create and update service references; Visual Studio generates code automatically via a special link that keeps your client and WCF RIA Services projects in sync at all times. Like a service reference, this link binds the two projects together, only a WCF RIA Services link couples them much more tightly than an ordinary service reference does. Public changes on the service side are reflected automatically in corresponding classes on the client side every time you perform a build, so you never need to worry about working against an outdated proxy in the client project simply because you forgot to manually update a service reference.

The WCF RIA Services link greatly simplifies the n-tier pattern, and makes traditional n-tier development feel more like the client/server experience. With the link established, Visual Studio continuously regenerates the client-side proxies to match the domain services on each build. It also auto-generates client-side copies of shared application logic you define in the services project, simply by looking for classes you’ve defined in files named *.shared.cs, or *.shared.vb. The link enforces automatic client-side validation and keeps validation rules in sync between the the domain services and the client at all times. Furthermore, client-side entities are completely self-tracking; you do not need to manually notify the context object of every change to every entity, as you are required to do with the WCF Data Services client library.

Comparing WCF Data Services and WCF RIA Services

The following table summarizes several key differences between the two platforms.

WCF Data Services WCF RIA Services
Supported Clients Resource-based API, supports all clients via deep REST and OData support. Domain-based API, most tailored for use with Silverlight, but supports other clients via SOAP, JSON, and OData.
Supported Data Access Layers Targets EF. Other DALs are supported, but greater effort is required. Supports EF, LINQ to SQL, and POCO (custom persistence layer).
Client Development Requires you to notify the context for change tracking. Supports self-tracking entities, synchronized client/server logic, and much more (particularly with Silverlight).
Service Development Instant, code-less, extensible REST services out of the box (with EF); “free CRUD.” Requires you to code CRUD operations manually in domain service classes.

From this comparison, it looks like WCF RIA Services is more attractive for Silverlight clients than non-Silverlight clients—regardless of which data access layer is used. Conversely, it shows that WCF Data Services is more appropriate for use with Entity Framework than it is with other data access layers—regardless of which client is used. But let’s examine things in a bit more detail.

If your scenario uses EF on the back end and targets Silverlight on the front end, then you are in the best position. Both WCF frameworks pack a huge win over writing traditional WCF services “by hand.” Your decision at this point is based on whether you simply require services to provide data access (that is, you primarily need CRUD support), or if you are seeking to leverage additional benefits. Another consideration is whether you are targeting Silverlight as the client exclusively or not.

Of the two, WCF Data Services is relatively lightweight, and requires almost no effort to get up and running. So it’s the better choice if you primarily require data access functionality in your services, particularly if you want to keep your service open to non-Silverlight clients as well. WCF RIA Services is more robust, and offers numerous additional features. This makes it a very compelling choice for the development of rich client applications. Although it began as a platform almost exclusively designed for Silverlight, support is steadily emerging for other client platforms via OData, SOAP, and JSON, as well as self-tracking entity libraries now available for jQuery. However, it requires the effort of creating domain service classes to support CRUD operations.

Finally, both frameworks are extensible, and both can be secured by traditional authentication and authorization techniques. They are also both capable of integrating with the ASP.NET Membership provider for role management and personalization.

What if you are using neither Silverlight nor Entity Framework? Well, then your work will be cut out for you whatever choice you make. With WCF Data Services, you will need to implement either the Reflection or Streaming provider, or write your own custom provider. And with WCF RIA Services, you will not get to fully enjoy all the benefits of the framework, but you will still need to write domain services and metadata classes. After careful consideration, you may well conclude that neither choice is appropriate, and decide instead to stick with tried and true WCF services, coding your own service contracts, data contracts, binding configurations, change tracking, validations, and so on.

Summary

Both WCF Data Services and WCF RIA Services can represent huge savings in development effort. Which one you choose (or whether indeed you choose to use either) depends a great deal on your data access layer of choice and the types of clients you intend to reach. This post tells you what you need to know to intelligently distinguish between them.

My upcoming book, Programming Microsoft SQL Server 2012, has an entire chapter dedicated to this topic. Look forward to extensive coverage and code samples for complete data access solutions using both platforms. The release date is just a few short months away, so stay tuned!

Follow

Get every new post delivered to your Inbox.

Join 36 other followers