Downloading SQL Server Express Edition

I’m often asked how to install SQL Server Express Edition, and what the different download options are. So I finally decided to explain it all in this blog post.

There are several SQL Server Express Edition downloads available on the Microsoft site, and they are available in both 32-bit and 64-bit versions. You can choose to install just the SQL Server Express database engine (and nothing else), or you can choose one of two other (larger) downloads: Express With Tools (which includes SQL Server Management Studio [SSMS]) or Express With Advanced Services (which includes SSMS, Full Text Search, and Reporting Services). There are also separate downloads for SSMS and LocalDB, but these do not include the SQL Server Express database engine needed to host local databases.

To install the SQL Server Express Edition database engine, follow these steps:

  1. Open Internet Explorer, and navigate to http://www.microsoft.com/en-us/download/details.aspx?id=29062.
  2. Click the large orange Download button.
  3. Select the appropriate download for your system.
    1. For 64-bit systems, choose ENU\x64\SQLEXPR_x64_ENU.exe.
    2. For 32-bit or 64-bit WoW systems, choose ENU\x86\SQLEXPR32_x86_ENU.exe.
    3. For 32-bit systems, choose ENU\x86\SQLEXPR_x86_ENU.exe.Note If you need to download SQL Server Management Studio (SSMS) as well, choose the Express With Tools file instead, which is the one that includes WT in the filename.
      F00xx01
  4. Click Next.
  5. If you receive a pop-up warning, click Allow Once.
    F00xx02-markedup
  6. When prompted to run or save the file, choose Run. This starts and runs the download.
  7. If the User Account Control dialog appears after the download files are extracted, click Yes.
  8. In the SQL Server Installation Center, click New SQL Server Stand-Alone Installation.
    F00xx03-markedup
  9. In the SQL Server 2012 Setup wizard, do the following:
    1. On the License Terms page, select I Accept The License Terms and click Next.
    2. On the Product Updates page, allow the wizard to scan for updates, and then click Next.
    3. On the Install Setup Files page, wait for the installation to proceed.
    4. On the Feature Selection page, Click Next.
    5. Continue clicking Next through all the remaining pages until the Installation Progress page, and wait for the installation to proceed.
    6. On the Complete page indicating a successful setup, click Close.
      F00xx04

I hope these instructions help you choose the right installation option for SQL Server Express Edition!

 

 

 

 

The Same, Only Different: Comparing Windows Azure SQL Database and On-Premise SQL Server

One of the most attractive aspects of Windows Azure SQL Database is that it shares virtually the same code-base and exposes the very same tabular data stream (TDS) as on-premise SQL Server. Thus, to a great extent, the same tools and applications that work with SQL Server work just the same and just as well with SQL Database. Notice that I said to a great extent, because despite their commonality, there are quite a few SQL Server features that SQL Database does not support. In this blog post, I discuss how and why these two platforms differ from one another, and explain the SQL Database constraints that you need to be aware of if you have previous experience with SQL Server. Where possible, I also suggest workarounds.

SQL Server and SQL Database differ in several ways; most notably, in terms of size limitations, feature support, and T-SQL compatibility. In many cases, these constraints are simply the price you pay for enjoying a hassle-free, self-managing, self-healing, always-available database in the cloud. That is, Microsoft cannot responsibly support features that impair its ability to ensure high-availability, and must be able to quickly replicate and relocate a SQL Database. This is why SQL Database places limits on database size, and doesn’t support certain specialized features such as FILESTREAM, for example.

Another common reason why a particular feature or T-SQL syntax might not be supported in SQL Database is that it’s simply not applicable. With SQL Database, administrative responsibilities are split between Microsoft and yourself. Microsoft handles all of the physical administration (such as disk drives and servers), while you manage only the logical administration (such as database design and security). This is why any and all T-SQL syntax that relates to physical resources (such as pathnames) is not supported in SQL Database. For example, you don’t control the location for primary and log filegroups. This is why you can’t include an ON PRIMARY clause with a CREATE DATABASE statement, and indeed, why SQL Database does not permit a filegroup reference in any T-SQL statement. Plainly stated, everything pertaining to physical resources (that is, infrastructure) is abstracted away from you with SQL Database

Yet still, in some cases, a certain SQL Server feature or behavior may be unsupported merely because Microsoft has just not gotten around to properly testing and porting it to SQL Database. Windows Azure is constantly evolving, so you need to keep watch for updates and announcements. This blog post is a great starting point, but the best way to stay current is by reviewing the Guidelines and Limitations section of the SQL Database documentation on the MSDN web site.

Size Limitations

With the exception of the free, lightweight Express edition of SQL Server, there is no practical upper limit on database size in any edition of SQL Server. A SQL Server database can grow as large as 524,272 terabytes (for SQL Server Express edition, the limit is 10 gigabytes).

In contrast, SQL Database has very particular size limitations. You can set the maximum size by choosing between the Web and Business editions. With a Web edition database, you can set the maximum database size to either 1 or 5 gigabytes. With a Business edition database, the maximum database size can range from 10 to 150 gigabytes. Other than the available choices for maximum database size, there is no difference in functionality between the two editions. The absolute largest supported database size is 150 gigabytes, although horizontal partitioning strategies (called sharding) can be leveraged for scenarios that require databases larger than 150 gigabytes.

Connection Limitations

SQL Database is far less flexible than SQL Server when it comes to establishing and maintaining connections. Keep the following in mind when you connect to SQL Database:

  • SQL Server supports a variety of client protocols, such as TCP/IP, Shared Memory, and Named Pipes. Conversely, SQL Database allows connections only over TCP/IP.
  • SQL Database does not support Windows authentication. Every connection string sent to SQL Database must always include a login username and password.
  • SQL Database often requires that @<server> is appended to the login username in connection strings. SQL Server has no such requirement.
  • SQL Database communicates only through port 1433, and does not support static or dynamic port allocation like SQL Server does.
  • SQL Database does fully support Multiple Active Result Sets (MARS), which allows multiple pending requests on a single connection.
  • Due to the unpredictable nature of the internet, SQL Database connections can drop unexpectedly, and you need to account for this condition in your applications. Fortunately, the latest version of the Entity Framework (EF6) addresses this issue with the new Connection Resiliency feature, which automatically handles the retry logic for dropped connections.

Unsupported Features

Listed below are SQL Server capabilities that are not supported in SQL Database, and I suggest workarounds where possible. Again, because this content is subject to change, I recommend that you check the MSDN web site for the very latest information.

  • Agent Service You cannot use the SQL Server Agent service to schedule and run jobs on SQL Database.
  • Audit The SQL Server auditing feature records server and database events to either the Windows event log or the file system, and is not supported in SQL Database.
  • Backup/Restore Conventional backups with the BACKUP and RESTORE commands are not supported with SQL Database. Although the BACPAC feature can be used to import and export databases (effectively backing them up and restoring them), this approach does not provide transactional consistency for changes made during the export operation. To ensure transactional consistency, you can either set the database as read-only before exporting it to a BACPAC, or you can use the Database Copy feature to create a copy of the database with transactional consistency, and then export that copy to a BACPAC file.
  • Browser Service SQL Database listens only on port 1433. Therefore, the SQL Server Browser Service which listens on various other ports is naturally unsupported.
  • Change Data Capture (CDC) This SQL Server feature monitors changes to a database, and captures all activity to change tables. CDC relies on a SQL Server Agent job to function, and is unsupported in SQL Database.
  • Common Language Runtime (CLR) The SQL Server CLR features (often referred to simply as SQL CLR) allow you to write stored procedures, triggers, functions, and user-defined types in any .NET language (such as C# or VB), as an alternative to using traditional T-SQL. In SQL Database, only T-SQL can be used; SQL CLR is not supported.
  • Compression SQL Database does not support the data compression features found in SQL Server, which allow you to compress tables and indexes.
  • Database object naming convention In SQL Server, multipart names can be used to reference a database object in another schema (with the two-part name syntax schema.object), in another database (with the three-part name syntax database.schema.object), and (if you configure a linked server) on another server (with the four-part name syntax server.database.schema.object). In SQL Database, two-part names can also be used to reference objects in different schemas. However, three-part names are limited to reference only temporary objects in tempdb (that is, where the database name is tempdb and the object name starts with a # symbol); you cannot access other databases on the server. And you cannot reference other servers at all, so four-part names can never be used.
  • Extended Events In SQL Server, you can create extended event sessions help to troubleshooting a variety of problems, such as excessive CPU usage, memory pressure, and deadlocks. This feature is not supported in SQL Database.
  • Extended Stored Procedures You cannot execute your own extended stored procedures (these are typically custom coded procedures written in C or C++) with SQL Database. Only conventional T-SQL stored procedures are supported.
  • File Streaming SQL Server native file streaming features, including FILESTREAM and FileTable, are not supported in SQL Database. You can instead consider using Windows Azure Blob Storage containers for unstructured data files, but it will be your job at the application level to establish and maintain references between SQL Database and the files in blob storage, though note that there will be no transactional integrity between them using this approach.
  • Full-Text Searching (FTS) The FTS service in SQL Server that enables proximity searching and querying of unstructured documents is not supported in SQL Database.
  • Mirroring SQL Database does not support database mirroring, which is generally a non-issue because Microsoft is ensuring data redundancy with SQL Database so you don’t need to worry about disaster recovery. This does also mean that you cannot use SQL Database as a location for mirroring a principal SQL Server database running on-premises. However, if you wish to consider the cloud for this purpose, it is possible to host SQL Server inside a Windows Azure virtual machine (VM) against which you can mirror an on-premise principal database. This solution requires that you also implement a virtual private network (VPN) connection between your local network and the Windows Azure VM, although it will work even without the VPN if you use server certificates.
  • Partitioning SQL Server allows you to partition tables and indexes horizontally (by groups of rows) across multiple filegroups within a database, which greatly improves the performance of very large databases. SQL Database has a maximum database size of 150 GB and gives you no control over filegroups, and thus does not support table and index partitioning.
  • Replication SQL Server offers robust replication features for distributing and synchronizing data, including merge replication, snapshot replication, and transactional replication. None of these features are supported by SQL Database, however SQL Data Sync can be used to effectively implement merge replication between a SQL Database and any number of other SQL Databases on Windows Azure and on-premise SQL Server databases.
  • Resource Governor The Resource Governor feature in SQL Server lets you manage workloads and resources by specifying limits on the amount of CPU and memory that can be used to satisfy client requests. These are hardware concepts that do not apply to SQL Database, and so the Resource Governor is naturally unsupported.
  • Service Broker SQL Server Service Broker provides messaging and queuing features, and is not supported in SQL Database.
  • System stored procedures SQL Database supports only a few of the system stored procedures provided by SQL Server. The unsupported ones are typically related to those SQL Server features and behaviors not supported by SQL Database. At the same time, SQL Database provides a few new system stored procedures not found in SQL Server that are specific to SQL Database (for example, sp_set_firewall_rule).
  • Tables without a clustered index Every table in a SQL Database must define a clustered index. By default, SQL Database will create a clustered index over the table’s primary key column, but it won’t do so if you don’t define a primary key. Interestingly enough, SQL Database will actually let you create a table with no clustered index, but it will not allow any rows to be inserted until and unless a clustered index is defined for the table. This limitation does not exist in SQL Server.
  • Transparent Data Encryption (TDE) You cannot use TDE to encrypt a SQL Database like you can with SQL Server.
  • USE In SQL Database, the USE statement can only refer to the current database; it cannot be used to switch between databases as it can with SQL Server. Each SQL Database connection is tied to a single database, so to change databases, you must connect directly to the database.
  • XSD and XML indexing SQL Database fully supports the xml data type, as well as most of the rich XML support that SQL Server provides, including XML Query (XQuery), XML Path (XPath), and the FOR XML clause. However XML schema definitions (XSD) and XML indexes are not supported in SQL Database.

Calling C++ From SQL CLR C# Code

Ever since Microsoft integrated the .NET Common Language Runtime (CLR) into the relational database engine back in SQL Server 2005, the recommended technique for extending T-SQL with custom code has been to use SQL CLR with a .NET language (such as C# or VB). This is because CLR code is managed code, meaning that at runtime, the .NET framework ensures that ill-behaved code can never crash the process it’s running in. Prior to SQL CLR, the only way to extend T-SQL was with extended stored procedures written in native C++. Because native code is unmanaged, buggy C++ code can all too easily crash the process it’s running in. In the case of an extended stored procedure, this means crashing SQL Server itself, which I think we can all agree is a bad thing. It is for this very reason that extended stored procedures are deprecated in SQL Server and why SQL CLR is the way to go instead.

That said, you may have an existing C++ dynamic link library (DLL file) which exposes some public function that you need to call from your T-SQL code. Sure, the recommended approach to take is to refactor that C++ code in C# and then call it using SQL CLR. But what if that’s not a viable option? Perhaps you don’t have access to the C++ source code, or perhaps you do and it’s prohibitively expensive to port it to C#. Or maybe you just like living on the edge. In any case, you find that you absolutely need to call into C++, but you don’t want to use extended stored procedures since they are deprecated (and relatively difficult to implement). In this scenario, you can create a C# wrapper function that calls into the C++ DLL, and then implement the C# wrapper function as a SQL CLR user-defined function (UDF). This blog post shows you exactly how to do just that.

First though, to be clear, this technique carries the same risk as extended stored procedures – just one C++ memory access violation resulting from a rogue pointer can instantly crash SQL Server. For this reason, you will see that there are a few additional steps required when implementing such a solution. Fundamentally, these additional steps make it clear that you are introducing risk, and that you absolve SQL Server of any blame if your custom code crashes the SQL Server process as a result.

The next thing to be aware of is 32/64-bit compatibility. That is, you cannot load a 32-bit C++ DLL into a 64-bit SQL Server process, and vice-versa (attempting to do so will surely result in SQL Server throwing a BadImageFormatException error). This presents no problem if you have access to the source code, since you can compile it as either a 32-bit or 64-bit DLL to match the version of SQL Server that you’re running. If you don’t have access to the source code, and the DLL that you have does not match your version of SQL Server, then that presents a greater challenge, although there are several advanced “run out-of-process” solutions that are beyond the scope of this blog post.

The following sections provide step-by-step procedures that walk you through the process of calling C++ from C# in a SQL CLR user-defined function.

  1. Create the C++ library.
  2. Test the C++ library (optional).
    1. Call from a C++ native executable harness.
    2. Call from a C# executable harness.
  3. Create the C# SQL CLR UDF.
  4. Deploy to SQL Server.

For simplicity’s sake, the C++ library code that we’ll be calling from SQL Server is a simple math function named AddIntegers. This function accepts two integer parameters and returns their sum. I’ll demonstrate using Visual Studio 2013, although everything works just the same with earlier Visual Studio versions (I can confirm this for certain with VS 2012 and VS 2010, but it most likely works with even older VS versions as well).

Creating the C++ library

The instructions assume that you’re running 64-bit SQL Server, and thus they also explain how to compile the native C++ code as a 64-bit DLL.

To create the C++ library, follow these steps:

  1. Start Visual Studio 2013 (note that these instructions also work with VS 2010 and 2012).
  2. Create the new C++ library project:
    1. From the File menu, choose New Project.
    2. Under Installed Templates, choose Visual C++
    3. Select the Win32 Project template.
    4. Name the project MathLibNative.
    5. Choose any desired location to create the project; for example, C:\Demo\.
    6. Name the solution MathLibFromSQL.
      CreateSolution
    7. Click OK
  3. When the Win32 Application Wizard appears:
    1. Click Next.
    2. For Application Type, choose DLL.
    3. For Additional Options, check Empty Project.
      CreateEmptyCppDllProject
    4. Click Finish.
  4. Create the header file. This will contain the publically visible signature to the AddIntegers function.
    1. Right-click the MathLibNative project in Solution Explorer and choose Add | New Item.
    2. In the Add New Item dialog, choose Header File (.h).
    3. Name the file MathFunctions.h.
    4. Click Add.
    5. Type the following in the code editor for MathFunctions.h:
      __declspec(dllexport) int AddIntegers(int a, int b);
  5. Implement the AddIntegers function.
    1. Right-click the MathLibNative project in Solution Explorer and choose Add | New Item.
    2. In the Add New Item dialog, choose C++ File (.cpp).
    3. Name the file MathFunctions.cpp.
    4. Click Add.
    5. Type the following in the code editor for MathFunctions.cpp.
      #include "MathFunctions.h"
      
      int AddIntegers(int a, int b)
      {
        int sum = a + b;
        return sum;
      }
  6. Configure the library project for 64-bit, which is required since we intend to load this DLL into 64-bit SQL Server.
    1. From the BUILD menu, choose Configuration Manager.
    2. Click the Active Solution Platform dropdown and choose <New…> to display the New Solution Platform dialog.
    3. In the Type Or Select The New Platform combobox, type MathLibNative.
      NewSolutionPlatform
    4. Click OK to close the New Solution Platform dialog.
    5. Click the Platform dropdown for the MathLibNative project (it is currently set for Win32, meaning 32-bit).
    6. Click <New…> to display the New Project Platform dialog.
    7. Choose x64 from the New Platform dropdown.
      NewProjectPlatform
    8. Click OK to close the New Project Platform dialog.
      ConfigurationManager1
    9. Click Close to close the Configuration Manager dialog.
  7. Press CTRL+SHIFT+B to build the solution and ensure there are no compiler errors. This creates the native DLL file named MathLibNative.dll. The __declspec(dllexport) attribute in the header file (in step 4e above) also causes the compiler to generate a library file named MathLibNative.lib that you can link to from another C++ application (you will do this in the next section when you create an executable C++ test harness).

Testing the C++ library (optional)

Ultimately, we’re going to call into the native DLL file directly from a SQL CLR UDF written in C#. However, it’s often helpful to create a test harness first. This aids in debugging and helps prove that the DLL works as expected. In this section, we’ll create two test harness applications, one native C++ executable and one C# executable. The C++ executable will allow you to single-step debug into the C++ DLL, and the C# executable will help you determine the correct interface for calling into the C++ DLL. Again, this isn’t strictly necessary; you can instead jump ahead to the next section and create the SQL CLR UDF. However, the moment something doesn’t work as expected (and that moment will come), you’ll need to fall back on one or both of these test harnesses to help discover what the problem is, so it’s good to have them.

Creating a native C++ executable test harness

To create the native C++ executable test harness, follow these steps:

  1. Create a new C++ executable project:
    1. Right-click the MathLibFromSql solution in Solution Explorer and choose Add | New Project.
    2. Under Installed Templates, choose Visual C++
    3. Select the Win32 Project template.
    4. Name the project MathLibNativeHarness.
    5. Leave the default for the project location.
    6. Click OK.
  2. When the Win32 Application Wizard appears:
    1. Click Next.
    2. For Application Type, choose Console Application.
    3. For Additional Options, check Empty Project.
    4. Click Finish.
  3. Create the harness code. This is placed in a function named main, which is the entry point for the console executable:
    1. Right-click the MathLibNativeHarness project in Solution Explorer and choose Add | New Item.
    2. In the Add New Item dialog, choose C++ File (.cpp).
    3. Name the file Main.cpp.
    4. Click Add.
    5. Type the following in the code editor for MathFunctions.cpp.
      #include "..\MathLibNative\MathFunctions.h"
      #include <iostream>
      
      int main()
      {
        int sum = AddIntegers(23, 9);
        std::cout << sum << std::endl;
      }
  4. Configure the harness project for 64-bit.
    1. From the BUILD menu, choose Configuration Manager.
    2. Click the Active Solution Platform dropdown choose MathLibNative.
    3. Click the Platform dropdown for the MathLibNativeHarness project.
    4. Click <New…> to display the New Project Platform dialog.
    5. Choose x64 from the New Platform dropdown.
    6. Click OK to close the New Project Platform dialog.
    7. Check the Build checkbox for the MathLibNativeHarness project.
      ConfigurationManager2
    8. Click Close to close the Configuration Manager dialog.
  5. Link to the library.
    1. Right-click the MathLibNativeHarness project in Solution Explorer and choose Properties.
    2. Expand the Linker options and click Input.
    3. For Additional Dependencies, click the dropdown and choose <Edit…>
      LinkerAdditionalDependencies
    4. In the Additional Dependencies dialog, type the full path to the MathLibNative.lib file generated by the compiler when it built the DLL file. If you created the solution in C:\Demo, the full path is C:\Demo\MathLibFromSQL\x64\Debug\MathLibNative.lib.
      AdditionalDependencies
    5. Click OK to close the Additional Dependencies dialog.
    6. Click OK to close the project’s property pages dialog.
  6. Press CTRL+SHIFT+B to build the solution and ensure there are no compiler errors.
  7. Run the C++ test harness.
    1. Right-click the MathLibNativeHarness project in Solution Explorer and choose Set As Startup Project.
    2. Press CTRL+F5 to run the harness without the debugger.
    3. The console output should appear with the expected result (32, which is 23 plus 9), proving that the DLL is being called properly.
      ConsoleOutputNativeHarness

If you wish, you can also run the harness with the debugger by pressing F5. This will display and close the console window too quickly for you to see the output, but you can set breakpoints and debug the code. You can even single-step into the DLL file itself.

Creating a C# executable test harness

Calling the native C++ DLL from C# works differently, and since that’s what we need to do in our SQL CLR UDF with a C# class library, it’s also helpful to figure out how to “get it right” first with a C# console application test harness. One critical part in getting this to work is to identify the correct entry point to the AddIntegers function. This would be easy if the entry point was simply named after the function (i.e., AddIntegers), but unfortunately that’s not the case. The entry point is based on the function name, but is preceded by a question mark symbol and suffixed with a bit of randomly generated text. Fortunately, there is a small, simple, and free tool available on the web named Dependency Walker that can discover the entry point for you.

Before creating the C# executable test harness, use Dependency Walker to discover the entry point to the AddIntegers function. To do so, follow these steps:

  1. Download Dependency Walker from http://www.dependencywalker.com.
  2. Extract the downloaded zip file to a folder of your choice.
  3. Navigate to the folder and launch depends.exe.
  4. In the Security Warning dialog, click Run.
  5. From the File menu, choose Open.
  6. Navigate to the folder where the native C++ DLL file was built (i.e., C:\Demo\MathLibFromSQL\x64\Debug).
  7. Double-click the file MathLibNative.dll.
  8. Dependency Walker may display a message that errors were detected, but this can be safely ignored; just click OK.
  9. Locate the entry point to the AddIntegers function in the second grid on the right.
    DependsWithCallout
  10. Leave this window open so you can easily copy the entry point name and paste it into the C# code in the next procedure.

To create the C# executable test harness, follow these steps:

  1. Create a new C# executable project:
    1. Right-click the MathLibFromSql solution in Solution Explorer and choose Add | New Project.
    2. Under Installed Templates, choose Visual C#
    3. Select the Console Application template.
    4. Name the project MathLibNativeCSharpHarness.
    5. Leave the default for the project location.
    6. Click OK.
  2. Replace the starter code generated by Visual Studio in Program.cs with the following:
    using System;
    using System.Runtime.InteropServices;
    
    namespace MathLibNativeCSharpHarness
    {
      class Program
      {
        [DllImport(@"C:\Demo\MathLibFromSQL\x64\Debug\MathLibNative.dll",
          CallingConvention = CallingConvention.Cdecl,
          EntryPoint = "?AddIntegers@@YAHHH@Z")]
        private static extern int AddIntegers(int a, int b);
    
        static void Main(string[] args)
        {
          int sum = AddIntegers(33, 19);
          Console.WriteLine(sum);
        }
      }
    }
  3. Note the EntryPoint parameter of the DllImport attribute. This is the value that I pasted in from Dependency Walker as I wrote this post. To use the appropriate EntryPoint parameter value for your environment:
    1. Return to the Dependency Walker window that you opened to MathLibNative.dll in the previous section.
    2. Right-click the Function value in the second grid on the right and choose Copy Function Name.
    3. Return to the C# code and paste it in as the EntryPoint parameter value in the DllImport attribute.
  4. Also note the pathname to MathLibNative.dll; adjust it as necessary if you have created the solution in a folder other than C:\Demo.
  5. Configure the harness project for 64-bit.
    1. From the BUILD menu, choose Configuration Manager.
    2. Click the Platform dropdown for the MathLibNativeCSharpHarness project.
    3. Click <New…> to display the New Project Platform dialog.
    4. Choose x64 from the New Platform dropdown (it may already be selected by default).
    5. Click OK to close the New Project Platform dialog.
    6. Check the Build checkbox for the MathLibNativeCSharpHarness project.
      ConfigurationManager3
    7. Click Close to close the Configuration Manager dialog.
  6. Press CTRL+SHIFT+B to build the solution and ensure there are no compiler errors.
  7. Run the C# test harness.
    1. Right-click the MathLibNativeCSharpHarness project in Solution Explorer and choose Set As Startup Project.
    2. Press CTRL+F5 to run the harness without the debugger.
    3. The console output should appear with the expected result (52, which is 33 plus 19), proving that the DLL is being called properly and that we have figured out the correct DllImport attribute to use in the C# code that we’ll create for the SQL CLR UDF.
      ConsoleOutputCSharpHarness

Creating the C# SQL CLR UDF

You’re now ready to write the SQL CLR UDF, which is a C# method decorated with the SqlFunction attribute. This is a simple wrapper method; all it does is call into the C++ DLL just like the C# executable harness we created in the previous procedure does. The wrapper method itself accepts and returns special data types that correspond to SQL Server. In our scenario, the method accepts two SqlInt32 parameters and returns a SqlInt32 value, where SqlInt32 corresponds to the int data type in C++ and C#.

To create the C# SQL CLR UDF, follow these steps:

  1. Create a new C# class library project.
    1. Right-click the MathLibFromSql solution in Solution Explorer and choose Add | New Project.
    2. Under Installed Templates, choose Visual C#
    3. Select the Class Library template.
    4. Name the project MathLibNativeSQLCLR.
    5. Leave the default for the project location.
    6. Click OK.
  2. Delete the Class1.cs file (this file was generated automatically for the project by Visual Studio and is not needed).
    1. Right-click the Class1.cs file in Solution Explorer and choose Delete.
    2. Click OK to confirm that you want to delete the file Class1.cs.
  3. Add the MathFunctions class.
    1. Right-click the MathLibNativeSQLCLR project in Solution and choose Add | New Item.
    2. In the Add New Item dialog, choose Class.
    3. Name the file MathFunctions.cs.
    4. Click Add.
    5. Replace the starter code generated by Visual Studio in MathFunctions.cs with the following:
      using System.Data.SqlTypes;
      using System.Runtime.InteropServices;
      
      namespace MathLibNativeSQLCLR
      {
        public class MathFunctions
        {
          [DllImport(@"C:\Demo\MathLibFromSQL\x64\Debug\MathLibNative.dll",
            CallingConvention = CallingConvention.Cdecl,
            EntryPoint = "?AddIntegers@@YAHHH@Z")]
          private static extern int AddIntegers(int a, int b);
      
          public static SqlInt32 AddIntegersUdf(SqlInt32 a, SqlInt32 b)
          {
            var sum = AddIntegers((int)a, (int)b);
            return sum;
          }
        }
      }
  4. Note the EntryPoint parameter of the DllImport attribute. This is the same value we used for the C# harness in the previous procedure. If you skipped the previous procedure as optional, you need to at least follow the instructions found there for using Dependency Walker to discover the correct entry point name.
  5. Also note the pathname to MathLibNative.dll; adjust it as necessary if you have created the solution in a folder other than C:\Demo.
  6. Configure the SQL CLR project for 64-bit.
    1. From the BUILD menu, choose Configuration Manager.
    2. Click the Platform dropdown for the MathLibNativeSQLCLR project.
    3. Click <New…> to display the New Project Platform dialog.
    4. Choose x64 from the New Platform dropdown (it may already be selected by default).
    5. Click OK to close the New Project Platform dialog.
    6. Check the Build checkbox for the MathLibNativeSQLCLR project.
      ConfigurationManager4
    7. Click Close to close the Configuration Manager dialog.
  7. Press CTRL+SHIFT+B to build the solution and ensure there are no compiler errors.

Deploying to SQL Server

You’re now ready to deploy the SQL CLR UDF you created in the previous procedure to a SQL Server database. To do this, use your tool of choice to connect to your SQL Server instance and open a new query window that you can execute T-SQL commands in. You can of course use SQL Server Management Studio (SSMS), or alternatively, you can use SQL Server Data Tools (SSDT) inside of Visual Studio (SSDT is installed by default with VS 2012 and VS 2013, but needs to be installed separately for VS 2010).

Most of the remaining steps are standard procedure with any SQL CLR implementation. However, some of them are required specifically because we’re calling native C++ code from SQL CLR.

  1. Enable SQL CLR. This, of course, is required to support any custom SQL CLR implementation in SQL Server.
    EXEC sp_configure 'clr enabled', 1
    RECONFIGURE
    GO
  2. Create the database.
    CREATE DATABASE MathLibDb
    GO
    USE MathLibDb
    GO
  3. Set the database’s TRUSTWORTHY property. This is normally not required for SQL CLR, but is needed here so that “unsafe” assemblies (that is, those that call into native C++ code) can be created in the database.
    ALTER DATABASE MathLibDb SET TRUSTWORTHY ON
    GO
  4. Create the assembly. Because this assembly calls into native C++ code, you must also specify PERMISSION_SET = UNSAFE.
    CREATE ASSEMBLY [MathLibNativeSQLCLR]
      AUTHORIZATION dbo
      FROM 'C:\Demo\MathLibFromSQL\MathLibNativeSQLCLR\bin\x64\Debug\MathLibNativeSQLCLR.dll'
      WITH PERMISSION_SET = UNSAFE
    GO
  5. Create the T-SQL UDF.
    CREATE FUNCTION AddIntegersUdf(@A int, @B int)
      RETURNS int
      AS EXTERNAL NAME [MathLibNativeSQLCLR].[MathLibNativeSQLCLR.MathFunctions].[AddIntegersUdf]
    GO

After all this effort, it’s quite rewarding to see the result. To watch the magic happen, invoke the T-SQL UDF just like you would any other, passing in any two numbers to be added. For example:

SELECT dbo.AddIntegersUdf(47, 16)

When you see the query return the sum of the two numbers passed in to the UDF, you know that everything is working correctly:

CallSQLCLRUDF

The T-SQL code calls the SQL CLR UDF, which calls the native C++ DLL that performs the work and returns the result all the way back up to SQL Server. So yes, it takes some effort, but it does work. Remember though, make sure your C++ code is well-behaved or you risk crashing SQL Server!

New Metadata Discovery Features in SQL Server 2012

It has always been possible to interrogate SQL Server for metadata (schema) information. You can easily discover all the objects in a database (tables, views, stored procedures, and so on) and their types by directly querying system tables (not recommended, as they can change from one version of SQL Server to another) or information schema views (which are consistent in each SQL Server version). It is significantly more challenging, however, to discover the result set schema for T-SQL statements or stored procedures that contain conditional logic. Using SET FMTONLY ON/OFF has been the common technique in the past for discovering the schema of a query’s result set without actually executing the query itself. For example, consider the following code:

USE AdventureWorks2012
GO

SET FMTONLY ON
SELECT * FROM HumanResources.Employee;
SET FMTONLY OFF

This SELECT statement, which would normally return all the rows from the HumanResources.Employee table, returns no rows at all. It just reveals the columns. The SET FMTONLY ON statement prevents queries from returning rows of data so that their schemas can be discovered, and this behavior remains in effect until SET FMTONLY OFF is encountered. SQL Server 2012 introduces several new system stored procedures and table-valued functions (TVFs) that provide significantly richer metadata discovery than what can be discerned using the relatively inelegant (and now deprecated) SET FMTONLY ON/OFF approach. These new procedures and functions are:

  • sys.sp_describe_first_result_set
  • sys.dm_exec_describe_first_result_set
  • sys.dm_exec_describe_first_result_set_for_object
  • sys.sp_describe_undeclared_parameters

In this blog post, I’ll explain how to use these new objects to discover schema information in SQL Server 2012.

sys.sp_describe_first_result_set

The sys.sp_describe_first_result_set stored procedure accepts a T-SQL statement and produces a highly detailed schema description of the first possible result set returned by that statement. The following code retrieves schema information for the same SELECT statement you used earlier to get information on all the columns in the HumanResources.Employee table:

EXEC sp_describe_first_result_set
 @tsql = N'SELECT * FROM HumanResources.Employee'

The following screenshot shows the wealth of information that SQL Server returns about each column in the result set returned by the sp_describe_first_result_set call:

sys.dm_exec_describe_first_result_set

There is also a data management function named sys.dm_exec_describe_first_result_set that works very similar to sys.sp_describe_first_result_set. But because it is implemented as a table-valued function (TVF), it is easy to query against it and limit the metadata returned. For example, the following query examines the same T-SQL statement, but returns just the name and data type of nullable columns:

SELECT name, system_type_name
 FROM sys.dm_exec_describe_first_result_set(
  'SELECT * FROM HumanResources.Employee', NULL, 1)
 WHERE is_nullable = 1

Here is the output:

name               system_type_name
-----------------  ----------------
OrganizationNode   hierarchyid
OrganizationLevel  smallint

Parameterized queries are also supported, if you supply an appropriate parameter signature after the T-SQL. The T-SQL in the previous example had no parameters, so it passed NULL for the “parameters parameter.” The following example discovers the schema of a parameterized query.

SELECT name, system_type_name, is_hidden
 FROM sys.dm_exec_describe_first_result_set('
  SELECT OrderDate, TotalDue
   FROM Sales.SalesOrderHeader
   WHERE SalesOrderID = @OrderID',
  '@OrderID int', 1)

Here is the output:

name             system_type_name  is_hidden
---------------  ----------------  ---------
OrderDate        datetime          0
TotalDue         money             0
SalesOrderID     int               1

You’d be quick to question why the SalesOrderID column is returned for a SELECT statement that returns only OrderDate and TotalDue. The answer lies in the last parameter passed to the data management function. A bit value of 1 (for true) tells SQL Server to return the identifying SalesOrderID column, because it is used to “browse” the result set. Notice that it is marked true (1) for is_hidden. This informs the client that the SalesOrderID column is not actually revealed by the query, but can be used to uniquely identify each row in the query’s result set.

What if multiple result sets are possible? There’s no problem with this as long as they all have the same schema. In fact, SQL Server will even try to forgive cases where multiple possible schemas are not exactly identical. For example, if the same column is nullable in one result set and non-nullable in the other, schema discovery will succeed and indicate the column as nullable. It will even tolerate cases where the same column has a different name (but same type) between two possible result sets, and indicate NULL for the column name, rather than arbitrarily choosing one of the possible column names or failing altogether.

The following code demonstrates this with a T-SQL statement that has two possible result sets depending on the value passed in for the @SortOrder parameter. Because both result sets have compatible schemas, the data management function succeeds in returning schema information.

SELECT name, system_type_name
 FROM sys.dm_exec_describe_first_result_set('
    IF @SortOrder = 1
      SELECT OrderDate, TotalDue
       FROM Sales.SalesOrderHeader
       ORDER BY SalesOrderID ASC
    ELSE IF @SortOrder = -1
      SELECT OrderDate, TotalDue
       FROM Sales.SalesOrderHeader
       ORDER BY SalesOrderID DESC',
   '@SortOrder AS tinyint', 0) 

Here is the output:

name         system_type_name
-----------  ----------------
OrderDate    datetime
TotalDue     money

Discovery won’t succeed if SQL Server detects incompatible schemas. In this next example, the call to the system stored procedure specifies a T-SQL statement with two possible result sets, but one returns three columns while the other returns only two columns.

EXEC sys.sp_describe_first_result_set
  @tsql = N'
    IF @IncludeCurrencyRate = 1
      SELECT OrderDate, TotalDue, CurrencyRateID
       FROM Sales.SalesOrderHeader
    ELSE
      SELECT OrderDate, TotalDue
       FROM Sales.SalesOrderHeader'

In this case, the system stored procedure raises an error that clearly explains the problem:

Msg 11509, Level 16, State 1, Procedure sp_describe_first_result_set, Line 53

The metadata could not be determined because the statement 'SELECT OrderDate, TotalDue, CurrencyRateID FROM Sales.SalesOrderHeader' is not compatible with the statement 'SELECT OrderDate, TotalDue FROM Sales.SalesOrderHeader'.

It is noteworthy to mention that the data management function copes with this scenario much more passively. Given conflicting result set schemas, it simply returns NULL and does not raise an error.

sys.dm_exec_describe_first_result_set_for_object

The data management function sys.dm_exec_describe_first_result_set_for_object can be used to achieve the same discovery against any object in the database. It accepts just an object ID and the Boolean “browse” flag to specify if hidden ID columns should be returned. You can use the OBJECT_ID function to obtain the ID of the desired object. The following code demonstrates this by returning schema information for the stored procedure GetOrderInfo.

CREATE PROCEDURE GetOrderInfo(@OrderID AS int) AS
  SELECT OrderDate, TotalDue
   FROM Sales.SalesOrderHeader
   WHERE SalesOrderID = @OrderID
GO

SELECT name, system_type_name, is_hidden
 FROM sys.dm_exec_describe_first_result_set_for_object(OBJECT_ID('GetOrderInfo'), 1)

Here is the output:

name             system_type_name   is_hidden
---------------  -----------------  ---------
OrderDate        datetime           0
TotalDue         money              0
SalesOrderID     int                1

sys.sp_describe_undeclared_parameters

Finally, the sys.sp_describe_undeclared_parameters stored procedure parses a T-SQL statement to discover type information about the parameters expected by the statement, as the following code demonstrates:

EXEC sys.sp_describe_undeclared_parameters
 N'IF @IsFlag = 1 SELECT 1 ELSE SELECT 0'

Here is the output:

parameter_ordinal name    suggested_system_type_id suggested_system_type_name ...
----------------- ------- ------------------------ -------------------------- -------
1                 @IsFlag 56                       int                        ... 

In this example, SQL Server detects the @IsFlag parameter, and suggests the int data type based on the usage in the T-SQL statement it was given to parse.

Enhance Portability with Partially Contained Databases in SQL Server 2012

The dependency of database-specific users upon server-based logins poses a challenge when you need to move or restore a database to another server. Although the users move with the database, their associated logins do not, and thus the relocated database will not function properly until you also setup and map the necessary logins on the target server. To resolve these types of dependency problems and help make databases more easily portable, SQL Server 2012 introduces “partially contained” databases.

The term “partially contained” is based on the fact that SQL Server itself merely enables containment—it does not enforce it. It’s still your job to actually implement true containment. From a security perspective, this means that partially contained databases allow you to create a special type of user called a contained user. The contained user’s password is stored right inside the contained database, rather than being associated with a login defined at the server instance level and stored in the master database. Then, unlike the standard SQL Server authentication model, contained users are authenticated directly against the credentials in the contained database without ever authenticating against the server instance. Naturally, for this to work, a connection string with a contained user’s credentials must include the Initial Catalog keyword that specifies the contained database name.

Creating a Partially Contained Database

To create a partially contained database, first enable the contained database authentication setting by calling sp_configure and then issue a CREATE DATABASE statement with the new CONTAINMENT=PARTIAL clause as the following code demonstrates.

-- Enable database containment

USE master
GO

EXEC sp_configure 'contained database authentication', 1
RECONFIGURE

-- Delete database if it already exists
IF EXISTS(SELECT name FROM sys.databases WHERE name = 'MyDB')
 DROP DATABASE MyDB
GO

-- Create a partially contained database
CREATE DATABASE MyDB CONTAINMENT = PARTIAL
GO

USE MyDB
GO

To reiterate, SQL Server doesn’t enforce containment. You can still break containment by creating ordinary database users for server-based logins. For this reason, it’s easy to convert an ordinary (uncontained) database to a partially contained database; simply issue an ALTER DATABASE statement and specify SET CONTAINMENT=PARTIAL. You’ll then be able to migrate the server-based logins to contained logins and achieve server independence.

Creating a Contained User

Once you have a contained database, you can create a contained user for it by issuing a CREATE USER statement and specifying WITH PASSWORD, as shown here:

CREATE USER UserWithPw
 WITH PASSWORD = N'password$1234'

This syntax is valid only for contained databases; SQL Server will raise an error if you attempt to create a contained user in the context of an uncontained database.

That’s all there is to creating partially contained databases with contained users. The only remaining point that’s worth calling out is that an Initial Catalog clause pointing to a partially contained database must be specified explicitly in a connection string that also specifies the credentials of a contained user in that database. If just the credentials are specified without the database, SQL Server will not scan the partially contained databases hosted on the instance for one that has a user with matching credentials. Instead, it will consider the credentials to be those of an ordinary SQL Server login, and will not authenticate against the contained database.

Other Partially Contained Database Features

Aside from server-based logins, there are many other dependencies that a database might have on its hosted instance. These include linked servers, SQL CLR, database mail, service broker objects, endpoints, replication, SQL Server Agent jobs, and tempdb collation. All these objects are considered to be uncontained entities since they all exist outside the database.

Uncontained entities threaten a database’s portability. Since these objects are all defined at the server instance level, behavior can vary unpredictably when databases are shuffled around from one instance to another. Let’s examine features to help you achieve the level of containment and stability that your circumstances require.

Uncontained Entities View

SQL Server provides a new data management view (DMV) called sys.dm_db_uncontained_entities that you can query on to discover potential threats to database portability. This DMV not only highlights dependent objects, it will even report the exact location of all uncontained entity references inside of stored procedures, views, functions, and triggers.

The following code creates a few stored procedures, and then joins sys.dm_db_uncontained_entities with sys.objects to report the name of all objects having uncontained entity references in them.

-- Create a procedure that references a database-level object
CREATE PROCEDURE GetTables AS
BEGIN
  SELECT * FROM sys.tables
END
GO

-- Create a procedure that references an instance-level object
CREATE PROCEDURE GetEndpoints AS
BEGIN
  SELECT * FROM sys.endpoints
END
GO

-- Identify objects that break containment
SELECT
  UncType = ue.feature_type_name,
  UncName = ue.feature_name,
  RefType = o.type_desc,
  RefName = o.name,
  Stmt = ue.statement_type,
  Line = ue.statement_line_number,
  StartPos = ue.statement_offset_begin,
  EndPos = ue.statement_offset_end
 FROM
  sys.dm_db_uncontained_entities AS ue
  INNER JOIN sys.objects AS o ON o.object_id = ue.major_id

Here is the result of the query:

UncType      UncName    RefType               RefName       Stmt    Line  StartPos  EndPos
-----------  ---------  --------------------  ------------  ------  ----  --------  ---
System View  endpoints  SQL_STORED_PROCEDURE  GetEndpoints  SELECT  5     218       274

The DMV identifies the stored procedure GetEndpoints as an object with an uncontained entity reference. Specifically, the output reveals that a stored procedure references the sys.endpoints view in a SELECT statement on line 5 at position 218. This alerts you to a database dependency on endpoints configured at the server instance level that could potentially pose an issue for portability. The GetTables stored procedure does not have any uncontained entity references (sys.tables is contained), and is therefore not reported by the DMV.

Collations and tempdb

Ordinarily, all databases hosted on the same SQL Server instance share the same tempdb database for storing temporary tables, and all the databases (including tempdb) on the instance use the same collation setting (collation controls string data character set, case sensitivity, and accent sensitivity). When joining between regular database tables and temporary tables, both your user database and tempdb must use a compatible collation. This, again, represents an instance-level dependency with respect to the fact that the collation setting can vary from one server instance to another. Thus, problems arise when moving databases between servers that have different collation settings for tempdb. The code below demonstrates the problem, and how to avoid it by using a contained database.

-- Create an uncontained database with custom collation
USE master
GO
IF EXISTS(SELECT name FROM sys.databases WHERE name = 'MyDB')
 DROP DATABASE MyDB
GO
CREATE DATABASE MyDB COLLATE Chinese_Simplified_Pinyin_100_CI_AS
GO

USE MyDB
GO

-- Create a table in MyDB (uses Chinese_Simplified_Pinyin_100_CI_AS collation)
CREATE TABLE TestTable (TextValue nvarchar(max))

-- Create a temp table in tempdb (uses SQL_Latin1_General_CP1_CI_AS collation)
CREATE TABLE #TempTable (TextValue nvarchar(max))

-- Fails, because MyDB and tempdb uses different collation
SELECT *
 FROM TestTable INNER JOIN #TempTable ON TestTable.TextValue = #TempTable.TextValue

-- Convert to a partially contained database
DROP TABLE #TempTable
USE master

ALTER DATABASE MyDB SET CONTAINMENT=PARTIAL
GO

USE MyDB
GO

-- Create a temp table in MyDB (uses Chinese_Simplified_Pinyin_100_CI_AS collation)
CREATE TABLE #TempTable (TextValue nvarchar(max))

-- Succeeds, because the table in tempdb now uses the same collation as MyDB
SELECT *
 FROM TestTable INNER JOIN #TempTable ON TestTable.TextValue = #TempTable.TextValue

-- Cleanup
DROP TABLE #TempTable
USE master
DROP DATABASE MyDB
GO

This code first creates an uncontained database that uses Chinese_Simplified_Pinyin_100_CI_AS collation on a server instance that uses (the default) SQL_Latin1_General_CP1_CI_AS collation. The code then creates a temporary table and then attempts to join an ordinary database table against it. The attempt fails because the two tables have different collations (that is, they each reside in databases that use different collations), and SQL Server issues the following error message:

Msg 468, Level 16, State 9, Line 81
Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and
"Chinese_Simplified_Pinyin_100_CI_AS" in the equal to operation.

Then the code issues an ALTER DATABASE…SET CONTAINMENT=PARTIAL statement to convert the database to a partially contained database. As a result, SQL Server resolves the conflict by collating the temporary table in tempdb in the same collation as the contained database, and the second join attempt succeeds.

Summary

Partially contained databases in SQL Server 2012 go a long way helping to improve the portability of databases across servers and instances. In this blog post, I demonstrated how to create a partially contained database with contained users, how to deal with collation issues, and how to use the new data management view to discover threats to containment and identify external dependencies. These capabilities are welcome news for SQL Server DBAs everywhere. Enjoy!

New Spatial Features in SQL Server 2012

SQL Server 2012 adds many significant improvements to the spatial support that was first introduced with SQL Server 2008. Among the more notable enhancements is support for curves (arcs), where SQL Server 2008 only supported straight lines, or polygons composed of straight lines. Microsoft also provides methods that test for non-2012-compatible (curved) shapes, and convert circular data to line data for backward compatibility with SQL Server 2008 (as well as other mapping platforms that don’t support curves).

New Spatial Data Classes

The three new spatial data classes in SQL Server 2012 are:

  • Circular strings
  • Compound curves
  • Curve polygons

All three of these shapes are supported in WKT, WKB, and GML by both the geometry and geography data types, and all of the existing methods work on all of the new circular shapes. My previous post, Geospatial Support for Circular Data in SQL Server 2012 covers these new spatial classes in detail, and shows you how to use them to create circular data. This post focuses on additional spatial features that are new in SQL Server 2012.

New Spatial Methods

Let’s explore a few of the new spatial methods. Some of these new methods complement the new curved shapes, while others add new spatial features that work with all shapes.

The STNumCurves and STCurveN Methods

These two methods can be invoked on any geometry or geography instance. They can be used together to discover information about the curves contained within the spatial instance. The STNumCurves method returns the total number of curves in the instance. You can then pass any number between 1 and what STNumCurves returns to extract each individual curve, and thus iterate all the curves in the instance.

For example, the WKT string CIRCULARSTRING(0 4, 4 0, 8 4, 4 8, 0 4) defines a perfect circle composed of two connected segments; 0 4, 4 0, 8, 4 and 8 4, 4 8, 0 4 (the third coordinate 8 4 is used both as the ending point of the first arc and the starting point of the second arc. The following code demonstrates how to obtain curve information from this circular string using the STNumCurves and STCurveN methods.

-- Create a full circle shape (two connected semi-circles)
DECLARE @C geometry = 'CIRCULARSTRING(0 4, 4 0, 8 4, 4 8, 0 4)'

-- Get the curve count (2) and the 1st curve (bottom semi-circle)
SELECT
  CurveCount = @C.STNumCurves(),
  SecondCurve = @C.STCurveN(2),
  SecondCurveWKT = @C.STCurveN(2).ToString()

This query produces the following output:

CurveCount SecondCurve                                     SecondCurveWKT
---------- ----------------------------------------------- -------------------------------
2          0x000000000204030000000000000000002040000000... CIRCULARSTRING (8 4, 4 8, 0 4)

You can see that STNumCurves indicates there are two curves, and that STCurveN(2) returns the second curve. If you view the results in the spatial viewer, you’ll see just the top half of the circle. This is the semi-circle defined by the second curve, which is converted back to WKT as CIRCULARSTRING (8 4, 4 8, 0 4). Notice that this represents the second segment of the full circle.

The BufferWithCurves Method

SQL Server 2008 introduced the STBuffer method which “pads” a line, effectively converting it into a polygon. If you look closely at the resulting polygon shapes in the spatial viewer, it appears that the points of each line string (including the mid points) are transformed into rounded edges in the polygon. However, the rounded edge look is actually produced by plotting many short straight lines that are clustered very closely together, presenting the illusion of a curve. This approach is necessary since curves were not previously supported before SQL Server 2012 (but the STBuffer method was).

Clearly, using native curve definitions in a curve polygon is more efficient than clustering a multitude of straight lines in an ordinary polygon. For backward compatibility, STBuffer continues to return the (inefficient) polygon as before. So SQL Server 2012 introduces a new method, BufferWithCurves, for this purpose. The following code uses BufferWithCurves to pad lines using true curves, and compares the result with its straight-line cousin, STBuffer.

DECLARE @streets geometry = '
 GEOMETRYCOLLECTION(
  LINESTRING (100 -100, 20 -180, 180 -180),
  LINESTRING (300 -300, 300 -150, 50 -50)
 )'
SELECT @streets.BufferWithCurves(10)

SELECT
  AsWKT = @streets.ToString(),
  Bytes = DATALENGTH(@streets),
  Points = @streets.STNumPoints()
 UNION ALL
 SELECT
  @streets.STBuffer(10).ToString(),
  DATALENGTH(@streets.STBuffer(10)),
  @streets.STBuffer(10).STNumPoints()
 UNION ALL
 SELECT
  @streets.BufferWithCurves(10).ToString(),
  DATALENGTH(@streets.BufferWithCurves(10)),
  @streets.BufferWithCurves(10).STNumPoints()

Here is the resulting shape returned by the first SELECT statement (the collection of padded line shapes generated by BufferWithCurves):

As with STBuffer, the new shapes have rounded edges around the points of the original line strings. However, BufferWithCurves generates actual curves, and thus, produces a significantly smaller and simpler polygon. The second SELECT statement demonstrates by comparing the three shapes—the original line string collection, the polygon returned by STBuffer, and the curve polygon returned by BufferWithCurves. Here are the results:

AsWKT                                                                       Bytes  Points
--------------------------------------------------------------------------  -----  ------
GEOMETRYCOLLECTION (LINESTRING (100 -100, 20 -180, 180 -180), LINESTRIN...  151    6
MULTIPOLYGON (((20.000000000000796 -189.99999999999858, 179.99999999999...  5207   322
GEOMETRYCOLLECTION (CURVEPOLYGON (COMPOUNDCURVE ((20.000000000000796 -1...  693    38

The first shape is the original geometry collection of line strings used for input, which requires only 151 bytes of storage, and has only 6 points. For the second shape, STBuffer pads the line strings to produce a multi-polygon (a set of polygons) that consumes 5,207 bytes and has a total of 322 points—a whopping 3,448 percent increase from the original line strings. In the third shape, BufferWithCurves is used to produce the equivalent padding using a collection of curve polygons composed of compound curves, so it consumes only 693 bytes and has only 38 points—a (relatively) mere 458 percent increase from the original line strings.

The ShortestLineTo Method

This new method examines any two shapes and figures out the shortest line between them. The following code demonstrates:

DECLARE @Shape1 geometry = 'POLYGON ((-20 -30, -3 -26, 14 -28, 20 -40, -20 -30))'
DECLARE @Shape2 geometry = 'POLYGON ((-18 -20, 0 -10, 4 -12, 10 -20, 2 -22, -18 -20))'

SELECT @Shape1
UNION ALL
SELECT @Shape2
UNION ALL
SELECT @Shape1.ShortestLineTo(@Shape2).STBuffer(.25)

This code defines two polygons and then uses ShortestLineTo to determine, generate, and return the shortest straight line that connects them. STBuffer is also used to pad the line string so that it is more clearly visible in the spatial viewer:

The MinDbCompatibilityLevel Method

With the added support for curves in SQL Server 2012 comes support for backward compatibility with previous versions of SQL Server (2008 and 2008 R2) that don’t support curves. The new MinDbCompatibilityLevel method accepts any WKT string and returns the minimum version of SQL Server required to support the shape defined by that string. For example, consider the following code:

DECLARE @Shape1 geometry = 'CIRCULARSTRING(0 50, 90 50, 180 50)'
DECLARE @Shape2 geometry = 'LINESTRING (0 50, 90 50, 180 50)'

SELECT
 Shape1MinVersion = @Shape1.MinDbCompatibilityLevel(),
 Shape2MinVersion = @Shape2.MinDbCompatibilityLevel()

The MinDbCompatibilityLevel method returns 110 (referring to version 11.0) for the first WKT string and 100 (version 10.0) for the second one. This is because the first WKT string contains a circular string, which requires SQL Server 2012 (version 11.0), while the line string in the second WKT string is supported by SQL Server 2008 (version 10.0) and higher.

The STCurveToLine and CurveToLineWithTolerance Methods

These are two methods you can use to convert curves to roughly equivalent straight line shapes. Again, this is to provide compatibility with previous versions of SQL Server and other mapping platforms that don’t support curves.

The STCurveToLine method converts a single curve to a line string with a multitude of segments and points that best approximate the original curve. The technique is similar to what we just discussed for STBuffer, where many short straight lines are connected in a cluster of points to simulate a curve. And, as explained in that discussion, the resulting line string requires significantly more storage than the original curve. To offer a compromise between fidelity and storage, the CurveToLineWithTolerance method accepts “tolerance” parameters to produce line strings that consume less storage space than those produced by STCurveToLine. The following code demonstrates by using both methods to convert the same circle shape from the previous STNumCurves and STCurveN example into line strings.

-- Create a full circle shape (two connected semi-circles)
DECLARE @C geometry = 'CIRCULARSTRING(0 4, 4 0, 8 4, 4 8, 0 4)'

-- Render as curved shape
SELECT
  Shape = @C,
  ShapeWKT = @C.ToString(),
  ShapeLen = DATALENGTH(@C),
  Points = @C.STNumPoints()

-- Convert to lines (much larger, many more points)
SELECT
  Shape = @C.STCurveToLine(),
  ShapeWKT = @C.STCurveToLine().ToString(),
  ShapeLen = DATALENGTH(@C.STCurveToLine()),
  Points = @C.STCurveToLine().STNumPoints()

-- Convert to lines with tolerance (not as much larger, not as many more points)
SELECT
  Shape = @C.CurveToLineWithTolerance(0.1, 0),
  ShapeWKT = @C.CurveToLineWithTolerance(0.1, 0).ToString(),
  ShapeLen = DATALENGTH(@C.CurveToLineWithTolerance(0.1, 0)),
  Points = @C.CurveToLineWithTolerance(0.1, 0).STNumPoints()

The query results show that the original circle consumes only 112 bytes and has 5 points. Invoking STCurveToLine on the circle converts it into a line string that consumes 1,072 bytes and has 65 points. That’s a big increase, but the resulting line string represents the original circle in high fidelity; you will not see a perceptible difference in the two when viewing them using the spatial viewer. However, the line string produced by CurveToLineWithTolerance consumes only 304 bytes and has only 17 points; a significantly smaller footprint, paid for with a noticeable loss in fidelity. As shown by the spatial viewer results below, using CurveToLineWithTolerance produces a circle made up of visibly straight line segments:

The STIsValid, IsValidDetailed and MakeValid Methods

Spatial instance validation has improved greatly in SQL Server 2012. The STIsValid method evaluates a spatial instance and returns a 1 (for true) or 0 (for false) indicating if the instance represents a valid shape (or shapes). If the instance is invalid, the new IsValidDetailed method will return a string explaining the reason why. The following code demonstrates.

DECLARE @line geometry = 'LINESTRING(1 1, 2 2, 3 2, 2 2)'

SELECT
 IsValid = @line.STIsValid(),
 Details = @line.IsValidDetailed()

This line string is invalid because the same point (2 2) is repeated, which results in “overlapping edges,” as revealed by the output from IsValidDetailed:

IsValid  Details
-------  -------------------------------------------------------------------
0        24413: Not valid because of two overlapping edges in curve (1).

SQL Server 2012 is more tolerant of invalid spatial instances than previous versions. For example, you can now perform metric operations (such as STLength) on invalid instances, although you still won’t be able to perform other operations (such as STBuffer) on them.

The new MakeValid method can “fix” an invalid spatial instance and make it valid. Of course, the shape will shift slightly, and there are no guarantees on the accuracy or precision of the changes made. The code in Listing 10-27 uses MakeValid to remove overlapping parts (which can be caused by anomalies such as inaccurate GPS traces), effectively converting the invalid line string into a valid spatial instance.

DECLARE @line geometry = 'LINESTRING(1 1, 2 2, 3 2, 2 2)'
SELECT @line.MakeValid().ToString() AS Fixed

The WKT string returned by the SELECT statement shows the “fixed” line string:

Fixed
-------------------------------------------------------------------
LINESTRING (3 2, 2 2, 1.0000000000000071 1.0000000000000036)

Other Enhancements

The remainder of this post gives brief mention to several other noteworthy spatial enhancements added in SQL Server 2012. These include better geography support, and precision and optimization improvements.

Support for geography Instances Exceeding a Logical Hemisphere

Previous versions of SQL Server supported geography objects as large as (slightly less than) a logical hemisphere (half the globe). This limitation has been removed in SQL Server 2012, which now supports geography instances of any size (even the entire planet).

When you define a geography polygon, the order in which you specify the ring’s latitude and longitude coordinates (known as vertex order) is significant (unlike geometry, where vertex order is insignificant). The coordinate points are always defined according to the left-foot inside rule; when you “walk” the boundary of the polygon, your left foot is on the inside. Thus, vertex order determines whether you are defining a small piece of the globe, relative to the larger piece defined by the entire globe except for the small piece (that is, the rest of the globe).

Since previous versions of SQL Server were limited to half the globe, it was impossible to specify the points of a polygon in the “wrong order,” simply because doing so resulted in too large a shape (and thus, raised an error). That error potential no longer exists in SQL Server 2012, so it’s even more critical to make sure your vertex order is correct, or you’ll be unwittingly working with the exact “opposite” shape.

If you have a geography instance that is known have the wrong vertex order, you can repair it using the new ReorientObject method. This method operates only on polygons (it has no effect on points, line strings, or curves), and can be used to correct the ring orientation (vertex order) of the polygon. The following code demonstrates.

-- Small (less than a logical hemisphere) polygon
SELECT geography::Parse('POLYGON((-10 -10, 10 -10, 10 10, -10 10, -10 -10))')

-- Reorder in the opposite direction for "rest of the globe"
SELECT geography::Parse('POLYGON((-10 -10, -10 10, 10 10, 10 -10, -10 -10))')

-- Reorient back to the small polygon
SELECT geography::Parse('POLYGON((-10 -10, -10 10, 10 10, 10 -10, -10 -10))').ReorientObject()

Three geography polygon instances are defined in this code. The first geography instance defines a very small polygon. The second instance uses the exact same coordinates, but because the vertex order reversed, it defines an enormous polygon whose area represents the entire globe except for the small polygon. As explained, such a definition would cause an error in previous versions of SQL Server, but is now accommodated without a problem by SQL Server 2012. The third instance reverses the vertex order on the same shape as the second instance, thereby producing the same small polygon as the first instance.

Full Globe Support

Along with the aforementioned support for geography instances to exceed a single logical hemisphere comes a new spatial data class called FULLGLOBE. As you may have guessed, this is a shape that represents the entire planet. If you’ve ever wondered how many square meters there are in the entire world, the following query gives you the answer (which is 510,065,621,710,996 square meters, so you can stop wondering).

-- Construct a new FullGlobe object (a WGS84 ellipsoid)
DECLARE @Earth geography = 'FULLGLOBE'

-- Calculate the area of the earth
SELECT PlanetArea = @Earth.STArea()

All of the common spatial methods work as expected on a full globe object. So you could, for example, “cut away” at the globe by invoking the STDifference and STSymDifference method against it using other polygons as cookie-cutter shapes.

New “Unit Sphere” Spatial Reference ID

The default spatial reference ID (SRID) in SQL Server 2012 is 4326, which uses the metric system as its unit of measurement. This SRID also represents the true ellipsoidal sphere shape of the earth. While this representation is most accurate, it’s also more complex to calculate precise ellipsoidal mathematics. SQL Server 2012 offers a compromise in speed and accuracy, by adding a new spatial reference id (SRID), 104001, which uses a sphere of radius 1 to represent a perfectly round earth.

You can create geography instances with SRID 104001 when you don’t require the greatest accuracy. The STDistance, STLength, and ShortestLineTo methods are optimized to run faster on the unit sphere, since it takes a relatively simple formula to compute measures against a perfectly round sphere (compared to an ellipsoidal sphere).

Better Precision

Internal spatial calculations in SQL Server 2012 are now performed with 48 bits of precision, compared to 27 bits used in SQL Server 2008 and SQL Server 2008 R2. This can reduce the error caused by rounding of floating point coordinates for original vertex points by the internal computation.

Summary

This blog post introduced you to some of the powerful new spatial capabilities added to SQL Server 2012. You saw how to use STNumCurves and STCurveN to obtain curve information from circular data, the BufferWithCurves method to produce more efficient padded line shapes than STBuffer, and the ShortestLineTo method to figure out the shortest distance between two shapes. Then you saw how to use the new MinDbCompatibilityLevel, STCurveToLine, and CurveToLineWithTolerance methods for supporting backward compatibility with SQL Server 2008. You also learned how SQL Server 2012 is much better at handling invalid spatial data, using the STIsValid, IsValidDetailed, and MakeValid methods. Finally, you learned about the new full globe support, unit sphere SRID, and improved precision.

You can learn much more about spatial functionality in my new book Programming Microsoft SQL Server 2012, which has an entire chapter dedicated to the topic. I hope you get to enjoy these powerful new spatial capabilities in SQL Server 2012!

How Significant is the SQL Server 2012 Release?

SQL Server, particularly its relational database engine, matured quite some time ago. So the “significance” of every new release over recent years can be viewed—in some ways—as relatively nominal. The last watershed release of the product was actually SQL Server 2005, which was when the relational engine (that, for years, defined SQL Server) stopped occupying “center stage,” and instead took its position alongside a set of services that now, collectively, define the product. These of course include the Business Intelligence (BI) components Reporting Services, Analysis Services, and Integration Services—features that began appearing as early as 1999 but, prior to SQL Server 2005, were integrated sporadically as a patchwork of loosely coupled add-ons, wizards, and management consoles. SQL Server 2005 changed all that with a complete overhaul. For the first time, SQL Server delivered a broader, richer, and more consolidated set of features and services which are built into—rather than bolted onto—the platform. None of the product versions that have been released since that time—SQL Server 2008, 2008 R2, and now 2012—have changed underlying architecture this radically.

That said, each SQL Server release continues to advance itself in vitally significant ways. SQL Server 2008 (released August 6, 2008) added a host of new features to the relational engine—T-SQL enhancements, Change Data Capture (CDC), Transparent Data Encryption (TDE), SQL Audit, FILESTREAM—plus powerful BI capabilities with Excel PivotTables, charts, and CUBE formulas. SQL Server 2008 R2 (released April 21, 2010) was dubbed the “BI Refresh,” adding PowerPivot for Excel, Master Data Services, and StreamInsight, but offering nothing more than minor tweaks and fixes in the relational engine.

The newest release—SQL Server 2012—is set to officially launch on March 7, 2012, and like every new release, this version improves on all of the key “abilities” (availability, scalability, manageability, programmability, and so on). The chief reliability improvement is the new High Availability Disaster Recovery (HADR) alternative to database mirroring. HADR (also commonly known as “Always On”) utilizes multiple secondary servers in an “availability group” for scale-out read-only operations (rather than forcing them to sit idle, just waiting for a failover to occur). Multi-subnet failover clustering is another notable new manageability feature.

SQL Server 2012 adds many new features to the relational engine, many of which I have blogged about, and most of which I cover in my new book (soon to be published). There are powerful T-SQL extensions, most notably the windowing enhancements, 22 new T-SQL functions, improved error handling, server-side paging, sequence generators, and rich metadata discovery techniques. There are also remarkable improvements for unstructured data, such as the FileTable abstraction over FILESTREAM and the Windows file system API, full-text property searching, and Statistical Semantic Search. Spatial support gets a big boost as well, with support for circular data, full-globe support, increased performance, and greater parity between the geometry and geography data types. Contained databases simplify portability, and new “columnstore” technology drastically increases performance of huge OLAP cubes (VertiPaq for PowerPivot and Analysis Services) and data warehouses (a similar implementation in the relational engine).

The aforementioned features are impressive, but still don’t amount to much more than “additives” over an already established database platform. A new release needs more than just extra icing on the cake for customers to perceive an upgrade as compelling. To that end, Microsoft has invested heavily in BI with SQL Server 2012, and the effort shows. The BI portion of the stack has been expanded greatly, delivering key advances in “pervasive insight.” This includes major updates to the product’s analytics, data visualization (such as self-service reporting with Power View), and master data management capabilities, as well Data Quality Services (DQS), a brand new data quality engine. There is also a new Business Intelligence edition of the product that includes all of these capabilities without requiring a full Enterprise edition license. Finally, SQL Server Data Tools (SSDT) brings brand new database tooling inside Visual Studio 2010. SSDT provides a declarative, model-based design-time experience for developing databases while connected, offline, on-premise, or in the cloud.

Of course there are more new features; I’ve only mentioned the most notable ones in this post. Although the relational engine has been rock-solid for several releases, it continues to enjoy powerful enhancements, while the overall product continues to reinvent itself by expanding its stack of services, particularly in the Business Intelligence space.
Follow

Get every new post delivered to your Inbox.

Join 51 other followers