SQL Server 2016 Dynamic Data Masking (DDM)

Introducing Dynamic Data Masking (DDM)

In this blog post, I’ll show you how to shield sensitive data from unauthorized users using Dynamic Data Masking, or DDM.

DDM lets you hide data, not by encrypting it, but by masking it. So there are no data changes in your tables. Rather, SQL Server automatically hides the actual data from all query results for users that don’t have permission to see it.

For example, take these query results:

MemberID    FirstName    LastName      Phone        Email
----------- ------------ ------------- ------------ --------------------------
1           Roberto      Tamburello    555.123.4567 RTamburello@contoso.com
2           Janice       Galvin        555.123.4568 JGalvin@contoso.com.co
3           Dan          Mu            555.123.4569 ZMu@contoso.net
4           Jane         Smith         454.222.5920 Jane.Smith@hotmail.com
5           Danny        Jones         674.295.7950 Danny.Jones@hotmail.com

With DDM, you can serve up the same results with the FirstName, Phone, and Email columns masked as follows:

MemberID    FirstName    LastName      Phone        Email
----------- ------------ ------------- ------------ --------------------------
1           Ro...to      Tamburello    xxxx         RXXX@XXXX.com
2           Ja...ce      Galvin        xxxx         JXXX@XXXX.com
3           ...          Mu            xxxx         ZXXX@XXXX.com
4           ...          Smith         xxxx         JXXX@XXXX.com
5           Da...ny      Jones         xxxx         DXXX@XXXX.com

DDM has four pre-defined masking functions:

default – You can completely hide data with the default function; that is, the function is named default. The default function masks the entire column value returned from the database, so that its completely hidden in the results, and works with virtually any data type.

partial – The partial function lets you be reveal some, but not all of the underlying data, and it works only with string types. With partial, you can show any number of characters at the beginning of a string, at the end of a string, or at both the beginning and the end of a string. The entire middle portion of the string is hidden, and gets replaced by a custom mask that you supply.

email – The email function is a bit strange, because it doesn’t really offer anything that you can’t achieve with the partial function. It’s actually just a convenient shorthand for the partial function that exposes only the first character of a string, and masks the rest with XXX@XXXX.com. In no way does the email function examine the string that its masking to see if it’s actually formatted as an email address; so any column you use this function with is going to look like an email address in your query results, regardless.

random – Finally, the random function is available for numeric columns. Like the default function, it completely hides the underlying value, but unlike default – which hides numeric columns by always masking them with a zero – the random function lets you supply a range of numbers from which a value is randomly chosen every time the data is queried.

As I said, DDM does not physically modify any data in the database. Instead, it masks the data on the fly, as it is queried by users that lack the permission to see the real data. This is a huge win for many common scenarios involving sensitive data; for example, in the healthcare industry, there are strict regulations around the sharing of so-called PHI, or personal health information. These regulations often make it hard to give a developer access to a decent sampling of live production data. DDM helps solve this problem, because administrators can now give developers access to production data, with all the sensitive personal data masked from view – and this is a process that’s often referred to as “anonymizing” the data.

At the same time, because everything is handled internally by SQL Server, there is no additional development effort needed at the application level; there’s no extra code to write, you just define your masks, and you’re done.

Masking Table Columns

DDM is very easy to use. When you create a table with columns that you’d like to mask, you simply include some additional MASKED WITH syntax, to tell SQL Server how to apply the masking:

CREATE TABLE Customer(
  FirstName varchar(20)
    MASKED WITH (FUNCTION='partial(1, "...", 0)'),
  LastName varchar(20),
  Phone varchar(12)
    MASKED WITH (FUNCTION='default()'),
  Email varchar(200)
    MASKED WITH (FUNCTION='email()'),
  Balance money
    MASKED WITH (FUNCTION='random(1000, 5000)'))

In this example, we’re using the partial function to partially mask the first name column. Specifically, the first parameter reveals just the first character of the first name, the second parameter is the custom mask to follow the first character with three dots, and the last parameter tells SQL Server to reveal none of the end characters of the first name. Using the default function for the phone column completely hides the phone number, the email function reveals the first character of the email column, followed by the mask XXX@XXXX.com, and the random function is being used here to randomly mask the Balance column with numbers between one-and-five-thousand.
If you already have a table with columns that you’d like to mask, it’s just as easy. Simply use the ADD MASKED WITH syntax with an ALTER TABLE, ALTER COLUMN statement, like so:

ALTER TABLE Customer
  ALTER COLUMN LastName
    ADD MASKED WITH (FUNCTION='default()')

Masking Different Data Types

The way a column gets masked by DDM depends on two things:

  • the masking function that you use
  • the data type of the column that you’re masking

DDM table

The default function is the only function that works with virtually all data types. In the case of a string column, it uses a hardcoded mask of four lower-case x’s, which is effectively the same as supplying a mask of four lower-case x’s to the partial function, without revealing any starting or ending characters. In the case of the other data types, DDM masks the column using an appropriate replacement value for that type; for example, using a zero for numeric data types, or January first 1900 for a date type. The default function can also be used to mask many of the more specialized data types, such as XML, binary and spatial columns, for example.

The partial function works only with string columns; meaning varchar, char, and text columns, as well as their Unicode version counterparts. This function accepts the three parameters I described on the previous slide, giving you control over how much or little gets exposed from the start and end of the string, and the custom mask to embed in the middle.

The email function also works only with string columns, and simply reveals just the first character of the string, followed by the mask XXX@XXXX.com, using upper-case X’s.

And finally, the random function works only with numeric columns, meaning for example int, bigint, short, money, decimal, and even bit. Use the random function instead of the default function to mask numeric columns, when you’d like to manufacture values that are semi-realistic, and not just zeros.

Discovering Masked Columns

To find out which columns in which tables are being masked, you can query sys.columns which now includes an is_masked and masking_function column to tell you if a column is being masked, and if so, the function being used to mask that column.

SELECT
  t.name AS TableName,
  mc.name AS ColumnName,
  mc.masking_function AS MaskingFunction
FROM
  sys.masked_columns AS mc
  INNER JOIN sys.tables AS t ON mc.[object_id] = t.[object_id]

Or, it’s even easier to query the new sys.masked_columns view, which internally, queries from sys.columns and filters to return only the masked columns; that is, where is_masked is set to 1, for true.

Mask Permissions

Dynamic data masking is based purely on the permissions that are either granted to a given user, or not.

So first, no special permission is actually required to create a new table, and define it with masked columns.  As for existing tables, the ALTER ANY MASK permission is required for a user to add a mask to an unmasked column, or to change or remove the mask of an already masked column.

The UNMASK permission is the big one, because it effectively ignores any masking defined for any columns. This is the permission that you want to be certain not to grant to users that should only view masked data; for example, you would be sure not to grant developers the UNMASK permission when supplying production data for them to use as sample data.

No special permission is needed to insert or date data in a masked column. So DDM effectively behaves like a write-only feature in the sense that a user has the ability to write data that they themselves will not be able to read back unless they also possess the UNMASK permission.

DDM Limitations and Considerations

There are a few things to keep in mind when you’re working with DDM. Although DDM does support most data types – even some of the highly specialized data types that are very often not supported by other SQL Server features – some columns cannot be masked. So while DDM can mask BLOB data stored in varbinary(max) columns, it cannot mask those columns if they are also decorated with the FILESTREAM attribute, which enables highly scalable BLOB storage in SQL Server.

Also, you can also not mask sparse columns that are part of a COLUMN_SET, or computed columns, although you can still create computed columns that are based on masked columns, in which case the computed column value will get masked as a result.

Keys for FULTEXT indexes can’t be masked, and finally columns that have been encrypted using the new Always Encrypted feature in SQL Server 2016 (which I’ll cover in a future blog post) cannot be masked.

It’s also important to remember that there is no way to ever derive the unmasked data once it has been masked. So even though SQL Server doesn’t actually modify the underlying data for masked columns, and ETL process – for example – that queries SQL Server and receives masked data, will wind up loading the target system with that masked data, and the target system will have no means of ever knowing what the unmasked data is.

 

Sharing State in SQL Server 2016 with SESSION_CONTEXT

If you’ve ever wanted to share session state across all stored procedures and batches throughout the lifetime of a database connection, you’re going to love SESSION_CONTEXT. When you connect to SQL Server 2016, you get a stateful dictionary, or what’s often referred to as a state bag, some place where you can store values, like strings and numbers, and then retrieve it by a key that you assign. In the case of SESSION_CONTEXT, the key is any string, and the value is a sql_variant, meaning it can accommodate a variety of types.

Once you store something in SESSION_CONTEXT, it stays there until the connection closes. It is not stored in any table in the database, it just lives in memory as long as the connection remains alive. And any and all T-SQL code that’s running inside stored procedures, triggers, functions, or whatever, can share whatever you shove into SESSION_CONTEXT.

The closest thing like this we’ve had until now has been CONTEXT_INFO, which allows you to store and share a single binary value up to 128 bytes long, which is far less flexible than the dictionary you get with SESSION_CONTEXT, which supports multiple values of different data types.

SESSION_CONTEXT is easy to use, just call sp_set_session_context to store the value by a desired key. When you do that, you supply the key and value of course, but you can also set the read_only parameter to true. This is locks the value in session context, so that it can’t be changed for the rest of the lifetime of the connection. So, for example, it’s easy for a client application to call this stored procedure to set some session context values right after it establishes the database connection. If the application sets the read_only parameter when it does this, then the stored procedures and other T-SQL code that then executes on the server can only read the value, they can’t change what was set by the application running on the client.

How do you extract a value out of session context? Well, by using the SESSION_CONTEXT function of course. You supply the key, and the function returns the value. But remember, it returns this as a sql_variant, so you’ll usually need to cast or convert the value into whatever data type you need, like a varchar, int, or date.

Let’s demonstrate with a quick example.

First I’ll create this stored procedure that does something. And to do it, the stored procedure needs to know the region, but it doesn’t get the region using a parameter. Instead, it calls the new SESSION_CONTEXT function, requesting the value keyed as UsRegion, which we cast to a varchar(20) using CONVERT.

CREATE PROCEDURE DoThis AS
BEGIN
	DECLARE @UsRegion varchar(20) = CONVERT(varchar(20), SESSION_CONTEXT(N'UsRegion'))
	SELECT DoThis = @UsRegion
END

And here’s another stored procedure that also takes no parameters, and gets the region from session_context.

CREATE PROCEDURE DoThat AS
BEGIN
	DECLARE @UsRegion varchar(20)
	SET @UsRegion = CONVERT(varchar(20), SESSION_CONTEXT(N'UsRegion'))
	SELECT DoThat = @UsRegion
END

Both these procedures are expecting some earlier code to store the desired region into session context. Until that happens, SESSION_CONTEXT simply returns NULL.

EXEC DoThis
EXEC DoThat

DoThis
--------------------
NULL

DoThat
--------------------
NULL

So Let’s call sp_set_session_context, and set the region to Southwest.

EXEC sp_set_session_context @key = N'UsRegion', @value = N'Southwest'

Now when we call the procedures, they both get Southwest from the session context.

EXEC DoThis
EXEC DoThat

DoThis
--------------------
Southwest

DoThat
--------------------
Southwest

Now before moving on to change the value, let’s first see that the value persists only for the lifetime of the connection. Run it a few more times, and you can see it’s still returning Southwest, but then close the connection and open a new one. Now running the stored procs again, the region is NULL, because session context is empty.

EXEC DoThis
EXEC DoThat

DoThis
--------------------
NULL

DoThat
--------------------
NULL

Now call sp_set_session_context again to change the region to Northeast,

EXEC sp_set_session_context @key = N'UsRegion', @value = N'Northeast'

And the procedures show Northeast, just as expected:

EXEC DoThis
EXEC DoThat

DoThis
--------------------
Northeast

DoThat
--------------------
Northeast

Change it once more to Southeast, only this time, also set the readonly parameter to true. This prevents the value from being changed again during this session:

EXEC sp_set_session_context @key = N'UsRegion', @value = N'Southeast', @read_only = 1

And the change is reflected when we run the procedures again:

EXEC DoThis
EXEC DoThat

DoThis
--------------------
Southeast

DoThat
--------------------
Southeast

Finally, try to change the region to Northwest:

EXEC sp_set_session_context @key = N'UsRegion', @value = N'Northwest'

You see that we can’t. because it’s locked in session context:

Msg 15664, Level 16, State 1, Procedure sp_set_session_context, Line 27
Cannot set key 'UsRegion' in the session context. The key has been set as read_only for this session.

There’s absolutely no way the value can be changed unless you kill the connection like we saw earlier, in which case of course you lose all the values in session context.

Just DIE Please! Introducing Drop If Exists (DIE) in SQL Server 2016

In SQL Server 2016, Drop If Exists (DIE) is a handy new T-SQL language enhancement that eliminates the need to test before you drop.

So, if you need to delete a table, or a stored procedure, but you don’t know if it exists or not, then you’re used to writing code that says, “If the table exists, then drop it,” or “if the stored procedure exists, then drop it.” We’ve all been writing code like this for years, but that doesn’t mean it’s been fun:

• IF OBJECT_ID('dbo.Product', 'U') IS NOT NULL
   DROP TABLE dbo.Product

• IF EXISTS (SELECT * FROM sys.triggers WHERE name = 'trProductInsert')
   DROP TRIGGER trProductInsert

So now, thankfully, we can leave the testing to SQL Server:

• DROP TABLE IF EXISTS dbo.Product

• DROP TRIGGER IF EXISTS trProductInsert

It doesn’t get much simpler than this. It’s really just an ordinary DROP statement; you just inject the new syntax IF EXISTS in the middle, and you’re done.

This new feature is available for just about anything you need to drop, so not just tables and triggers, but all of the following:

  • AGGREGATE
  • ASSEMBLY
  • DATABASE
  • DEFAULT
  • INDEX
  • PROCEDURE
  • ROLE
  • RULE
  • SCHEMA
  • SECURITY POLICY
  • SEQUENCE
  • SYNONYM
  • TABLE
  • TRIGGER
  • TYPE
  • VIEW

So no, DIE isn’t really any major new big feature, but if you like neat code – as I do – then it’s welcome just the same.

Overview of New SQL Server 2016 Developer Features

Every new version of SQL Server is packed with new features, and SQL Server 2016 is no exception. In this blog post, I briefly describe the major new developer focused features introduced in SQL Server 2016. I’ll cover many of these features in greater depth, in upcoming posts.

Drop If Exists is a small but convenient language enhancement which helps you write neater T-SQL code, because you no longer need to test if an object exists before deleting it.

SESSION_CONTEXT gives you a dictionary object that maintains its state across the lifetime of the database connection, so it’s a new easy way to share state across stored procedures running on the server, and even to share state between the client and the server.

• With Dynamic Data Masking, or DDM, you can shield sensitive information in your tables from unauthorized users by masking them, and this works purely with permissions, without ever modifying the actual data in the table.

Row-Level Security, or RLS, lets you hide different rows for different users, based on your own custom criteria. The hidden rows are automatically filtered out of all queries that get issued against the table, and you can also block users from inserting, updating, or deleting rows according to your needs.

Always Encrypted lets you encrypt data that is never decrypted in any location other than the client. By using client-side certificates to perform client-side encryption and decryption, the data is always encrypted – not just on disk, but in-flight, as it traverses the network.

• With Stretch DB, you can keep using your own data centers and SQL Servers to host and manage your data, but still allow tables you designate for remote data archive to be migrated to the cloud on Azure SQL Database. So you keep your hot data on-premises, but let your cold data stretch out to the cloud, completely transparently.

Temporal data is an exciting new feature that automatically tracks changes made to a table, and records those changes to a history table. Then, the query engine integrates with the history table and gives you this almost magical experience of time-travel, where you can run a query against a table as it appeared at any point in time of the past.

JSON support. XML support first appeared back in SQL Server 2000, and then got a major boost in 2005 with the native XML data type. Today, JSON is the new XML, and SQL Server 2016 provides JSON support very similar to what’s possible with XML. You can shred, store, query, and manipulate JSON documents in just about any way you need to, in SQL Server 2016.

Hekaton improvements – QL Server 2014 introduced In-Memory OLTP, which many still call by its code name, “Hekaton.” Hekaton can dramatically boost performance by migrating from traditional disk-based tables to newer memory-optimized tables. The technology is compelling, but the initial release in 2014 carried a lot of limitations, and the most egregious ones have been removed in 2016.

PolyBase – We’re living in a world of big data, where increasingly, massive amounts of information is being stored in large No-SQL stores such as Hadoop and Azure Blob Storage. PolyBase is a new feature in that lets you integrate with both Hadoop and Azure Blob Storage. By defining external tables that map to these No-SQL environments, you can write T-SQL queries that seamlessly retrieves data from them, and can even push portions of the query down to execute on Hadoop as a compute job.

• The new QueryStore feature will cache execution plans, and capture the performance of the same query over time. This is a great tool when you’re trying to troubleshoot the performance of a query that once had a good execution plan, but no longer does because of some environmental change, say some change in table statistics. With query store, you can much more easily identify that change, and make the necessary adjustments to ensure that your SQL Server continues to devise good execution plans.

R Integration – R is an analytic programming language that has grown very popular over recent years, and now SQL Server 2016 introduces R services. This lets you write code in R and run it right inside the database engine itself. This is a huge win for data scientists who will no longer need to first extract their data out of SQL Server before they can analyze it with R; instead, they can bring their R code right to the data, and let it run there.

Stay tuned for upcoming posts for more detailed coverage on these awesome new SQL Server features!

Integrating Document BLOB Storage with SQL Server

NoSQL platforms can support highly scalable databases with BLOB attachments (images, documents, and other files), but if you think you need to embrace a NoSQL solution in lieu of SQL Server just because you have a high volume of BLOBs in your database, then think again. Sure, if you have good reasons to go with NoSQL anyway – for example, if you want the flexibility of schema-free tables, and you can accept the compromises of eventual transactional consistency – then NoSQL can fit the bill nicely.

But critical line-of-business applications often can’t afford the relaxed constraints of NoSQL databases, and usually require schemas that are strongly typed, with full transactional integrity; that is, a full-fledged relational database system (RDBMS). However, relational database platforms like SQL Server were originally designed and optimized to work primarily with structured data, not BLOBs. And so historically, it’s never been feasible to store large amounts of BLOB data directly in the database. That is, until FILESTREAM.

With FILESTREAM, Microsoft addresses the dilemma of storing BLOBs within the relational database. My new Pluralsight course, SQL Server 2012-2014 Native File Streaming, explains this innovative feature in detail, and in this blog post, I’ll discuss how FILESTREAM (and its related technologies) can be used to implement a highly-scalable BLOB storage solution that’s fully integrated with a relational SQL Server database. You’ll also find live demos on everything covered by this post in the course.

Introducing FILESTREAM

Although SQL Server was never originally intended to handle BLOBs in large scale, this is no longer true as of FILESTREAM (introduced in SQL Server 2008). Before FILESTREAM, SQL Server was forced to shove BLOBs into the standard database filegroups, which are really optimized for storing structured row data in 8k pages. Because BLOBs don’t fit naturally within this structure, they must be pushed into off-row storage, which bloats the structured filegroups, and ultimately kills performance.

FS1b

FILESTREAM changes all that. First, to be clear, FILESTREAM is not actually a data type. Rather, it’s an attribute that you apply to the varbinary(max) data type, the same data type that you would use to store BLOBs directly inside the row. But by merely appending the FILESTREAM attribute to the varbinary(max) data type, SQL Server takes a radically different approach to physical BLOB storage. Rather than inundating the standard database filegroups with BLOBs, SQL Server stores BLOB content as files in the file system – where they belong; the file system being a native environment optimized for storing and streaming unstructured binary content. At the same time, it establishes and maintains reference pointers between the rows in the standard filegroups and the files in the file system that are tied to varbinary(max) columns in those rows. All this magic occurs behind the scenes, and is totally transparent to any existing code that works with ordinary varbinary(max) columns.

FS2

In this manner, the BLOB data is physically stored separately from structured row data, but it is logically an integral part of the database. So for example, backing up the database includes the BLOB data, with the option of performing a partial backup that excludes the FILESTREAM filegroup when you want to create smaller backups that don’t include BLOB data.

Furthermore, this solution provides full transactional consistency – because FILESTREAM integrates with the NTFS file system, and NTFS is a transactional file system. So when you start a database transaction and insert a row, and that row includes BLOB data stored in a varbinary(max) FILESTREAM column, then SQL Server automatically initiates an NTFS file system transaction over that BLOB data. Then, the fate of the file system transaction hinges on the fate of the database transaction. If and when the database transaction commits, then SQL Server will also commit the NTFS file system transaction; similarly, rolling back the database transaction automatically rolls back the NTFS transaction.

Accessing BLOBs with T-SQL

With FILESTREAM, you can treat BLOBs as ordinary varbinary(max) columns in T-SQL. For example, you can use the OPENROWSET function with the BULK provider to import an external file into a varbinary(max) column, and if that column is decorated with the FILESTREAM attribute, then SQL Server will automatically store a copy of that file as a BLOB in the NTFS file system behind the scenes, rather than force-fitting it into the standard database filegroups.

For example:

INSERT INTO PhotoAlbum(PhotoId, PhotoDescription, Photo)
  VALUES(
    3,
    'Mountains',
    (SELECT BulkColumn FROM OPENROWSET(BULK 'C:\Demo\Ascent.jpg', SINGLE_BLOB) AS x))

To retrieve BLOBs, it’s a simple SELECT:

SELECT * FROM PhotoAlbum

FS3

Using SqlFileStream and the Streaming API

Although FILESTREAM delivers scalable storage by leveraging the NTFS file system behind the scenes, BLOB access needs to scale as well. It’s great that you can enjoy total transparency by just using T-SQL access, but stop for a moment and think about what SQL Server needs to do when retrieving BLOBs with T-SQL. In order to serve up the Photo column in the resultset shown above for the SELECT statement, for example, SQL Server needed to read the entire contents of each BLOB from the NTFS file system that it’s managing internally, and this can easily and suddenly place a great deal of memory pressure on the server.

To address this concern, FILESTREAM exposes the streaming API. When you use this API, SQL Server still manages the file system behind the scenes, only it shifts the burden and memory requirements of actually reading and writing BLOBs in the file system off of itself and onto the client application. This keeps the memory requirements on SQL Server very lean, regardless of how large your BLOBs may be.

The SqlFileStream class is a managed code wrapper around the streaming API, which makes it extremely easy to use from .NET. In C# or VB .NET, you start a database transaction and issue an INSERT statement, but you don’t actually include the BLOB content with the INSERT statement. Instead, SQL Server passes you back the information you need to create a SqlFileStream object. This object inherits from the base System.IO.Stream class, meaning that it supports all the standard read/write methods of standard .NET stream classes, including memory streams, HTTP request/response streams, and local file streams. So it’s easy to then stream your BLOBs in and out, using buffers in memory allocated to your application – not SQL Server. Then, you just commit the database transaction, and SQL Server automatically commits the NTFS file system transaction at the same time.

In my course, I show you SqlFileStream up close, and demonstrate how to program against the streaming API from a thick client application, a client/server (Web) application, and in an n-tier (WCF) scenario as well.

Introducing FileTable

The FILESTREAM story only gets better with FileTable, added in SQL Server 2012. While FILESTREAM revolutionizes BLOB storage in SQL Server, it’s only accessible to developers and administrators. What about ordinary users? They’re certainly not going to write T-SQL or streaming API code to access BLOBs. And there’s also no way for ordinary client applications to access FILESTREAM data.

The solution is FileTable, which combines FILESTREAM with the hierarchyid data type to furnish an “emulated” file system; that is, a file system that users and applications can work with, but which is really a FileTable in a SQL Server database. A FileTable is just a regular table except that it has a fixed schema; specifically, it has these pre-determined columns for the metadata of the emulated file system:

FS4

Every row in a FileTable represents either a file or a folder (depending on the is_directory column), and the hierarchyid value in the path_locator column is what implements the folder structure of the emulated file system. The hierarchyid data type has methods that you can use to query and manipulate the structure; for example, you can programmatically move entire subtrees from one parent to another.

For rows that represent files, the file_stream column holds the actual BLOB, and this is a varbinary(max) FILESTREAM column. So behind the scenes, it is stored in the NTFS file system just like a varbinary(max) FILESTREAM column of an ordinary table (a non-FileTable) would be.

And so, in addition to being able to use T-SQL or the streaming API with a FileTable, the emulated file system that a FileTable represents also gets exposed to users and client applications via a Windows file share. As a result, changes made to the table in the database are reflected in the emulated file system, and conversely, changes made to the emulated file system by users or client applications are reflected automatically in the database, which ultimately pushes down into the physical NTFS file system being used for BLOB storage behind the scenes.

FS5

Summary

This blog post explained FILESTREAM, and its related feature, FileTable. We first saw how FILESTREAM supports scalable BLOB storage using the NTFS file system behind the scenes, and provides transparent T-SQL access using the varbinary(max) data type. We also learned about the streaming API and SqlFileStream, which shifts the burden and memory requirements for streaming BLOBs off of SQL Server and onto client applications, providing scalable BLOB access. And we finally saw how FileTable combines FILESTREAM with the hierarchyid data type to furnish an emulated file system on the front end that users and client applications can interact with, but which in actuality is just a table in the database.

And so, with FILESTREAM, line-of-business applications can embrace scalable BLOB integration without being forced to consider a NoSQL alternative to SQL Server.

Downloading SQL Server Express Edition

I’m often asked how to install SQL Server Express Edition, and what the different download options are. So I finally decided to explain it all in this blog post.

There are several SQL Server Express Edition downloads available on the Microsoft site, and they are available in both 32-bit and 64-bit versions. You can choose to install just the SQL Server Express database engine (and nothing else), or you can choose one of two other (larger) downloads: Express With Tools (which includes SQL Server Management Studio [SSMS]) or Express With Advanced Services (which includes SSMS, Full Text Search, and Reporting Services). There are also separate downloads for SSMS and LocalDB, but these do not include the SQL Server Express database engine needed to host local databases.

To install the SQL Server Express Edition database engine, follow these steps:

  1. Open Internet Explorer, and navigate to http://www.microsoft.com/en-us/download/details.aspx?id=29062.
  2. Click the large orange Download button.
  3. Select the appropriate download for your system.
    1. For 64-bit systems, choose ENU\x64\SQLEXPR_x64_ENU.exe.
    2. For 32-bit or 64-bit WoW systems, choose ENU\x86\SQLEXPR32_x86_ENU.exe.
    3. For 32-bit systems, choose ENU\x86\SQLEXPR_x86_ENU.exe.Note If you need to download SQL Server Management Studio (SSMS) as well, choose the Express With Tools file instead, which is the one that includes WT in the filename.
      F00xx01
  4. Click Next.
  5. If you receive a pop-up warning, click Allow Once.
    F00xx02-markedup
  6. When prompted to run or save the file, choose Run. This starts and runs the download.
  7. If the User Account Control dialog appears after the download files are extracted, click Yes.
  8. In the SQL Server Installation Center, click New SQL Server Stand-Alone Installation.
    F00xx03-markedup
  9. In the SQL Server 2012 Setup wizard, do the following:
    1. On the License Terms page, select I Accept The License Terms and click Next.
    2. On the Product Updates page, allow the wizard to scan for updates, and then click Next.
    3. On the Install Setup Files page, wait for the installation to proceed.
    4. On the Feature Selection page, Click Next.
    5. Continue clicking Next through all the remaining pages until the Installation Progress page, and wait for the installation to proceed.
    6. On the Complete page indicating a successful setup, click Close.
      F00xx04

I hope these instructions help you choose the right installation option for SQL Server Express Edition!

 

 

 

 

The Same, Only Different: Comparing Windows Azure SQL Database and On-Premise SQL Server

One of the most attractive aspects of Windows Azure SQL Database is that it shares virtually the same code-base and exposes the very same tabular data stream (TDS) as on-premise SQL Server. Thus, to a great extent, the same tools and applications that work with SQL Server work just the same and just as well with SQL Database. Notice that I said to a great extent, because despite their commonality, there are quite a few SQL Server features that SQL Database does not support. In this blog post, I discuss how and why these two platforms differ from one another, and explain the SQL Database constraints that you need to be aware of if you have previous experience with SQL Server. Where possible, I also suggest workarounds.

SQL Server and SQL Database differ in several ways; most notably, in terms of size limitations, feature support, and T-SQL compatibility. In many cases, these constraints are simply the price you pay for enjoying a hassle-free, self-managing, self-healing, always-available database in the cloud. That is, Microsoft cannot responsibly support features that impair its ability to ensure high-availability, and must be able to quickly replicate and relocate a SQL Database. This is why SQL Database places limits on database size, and doesn’t support certain specialized features such as FILESTREAM, for example.

Another common reason why a particular feature or T-SQL syntax might not be supported in SQL Database is that it’s simply not applicable. With SQL Database, administrative responsibilities are split between Microsoft and yourself. Microsoft handles all of the physical administration (such as disk drives and servers), while you manage only the logical administration (such as database design and security). This is why any and all T-SQL syntax that relates to physical resources (such as pathnames) is not supported in SQL Database. For example, you don’t control the location for primary and log filegroups. This is why you can’t include an ON PRIMARY clause with a CREATE DATABASE statement, and indeed, why SQL Database does not permit a filegroup reference in any T-SQL statement. Plainly stated, everything pertaining to physical resources (that is, infrastructure) is abstracted away from you with SQL Database

Yet still, in some cases, a certain SQL Server feature or behavior may be unsupported merely because Microsoft has just not gotten around to properly testing and porting it to SQL Database. Windows Azure is constantly evolving, so you need to keep watch for updates and announcements. This blog post is a great starting point, but the best way to stay current is by reviewing the Guidelines and Limitations section of the SQL Database documentation on the MSDN web site.

Size Limitations

With the exception of the free, lightweight Express edition of SQL Server, there is no practical upper limit on database size in any edition of SQL Server. A SQL Server database can grow as large as 524,272 terabytes (for SQL Server Express edition, the limit is 10 gigabytes).

In contrast, SQL Database has very particular size limitations. You can set the maximum size by choosing between the Web and Business editions. With a Web edition database, you can set the maximum database size to either 1 or 5 gigabytes. With a Business edition database, the maximum database size can range from 10 to 150 gigabytes. Other than the available choices for maximum database size, there is no difference in functionality between the two editions. The absolute largest supported database size is 150 gigabytes, although horizontal partitioning strategies (called sharding) can be leveraged for scenarios that require databases larger than 150 gigabytes.

Connection Limitations

SQL Database is far less flexible than SQL Server when it comes to establishing and maintaining connections. Keep the following in mind when you connect to SQL Database:

  • SQL Server supports a variety of client protocols, such as TCP/IP, Shared Memory, and Named Pipes. Conversely, SQL Database allows connections only over TCP/IP.
  • SQL Database does not support Windows authentication. Every connection string sent to SQL Database must always include a login username and password.
  • SQL Database often requires that @<server> is appended to the login username in connection strings. SQL Server has no such requirement.
  • SQL Database communicates only through port 1433, and does not support static or dynamic port allocation like SQL Server does.
  • SQL Database does fully support Multiple Active Result Sets (MARS), which allows multiple pending requests on a single connection.
  • Due to the unpredictable nature of the internet, SQL Database connections can drop unexpectedly, and you need to account for this condition in your applications. Fortunately, the latest version of the Entity Framework (EF6) addresses this issue with the new Connection Resiliency feature, which automatically handles the retry logic for dropped connections.

Unsupported Features

Listed below are SQL Server capabilities that are not supported in SQL Database, and I suggest workarounds where possible. Again, because this content is subject to change, I recommend that you check the MSDN web site for the very latest information.

  • Agent Service You cannot use the SQL Server Agent service to schedule and run jobs on SQL Database.
  • Audit The SQL Server auditing feature records server and database events to either the Windows event log or the file system, and is not supported in SQL Database.
  • Backup/Restore Conventional backups with the BACKUP and RESTORE commands are not supported with SQL Database. Although the BACPAC feature can be used to import and export databases (effectively backing them up and restoring them), this approach does not provide transactional consistency for changes made during the export operation. To ensure transactional consistency, you can either set the database as read-only before exporting it to a BACPAC, or you can use the Database Copy feature to create a copy of the database with transactional consistency, and then export that copy to a BACPAC file.
  • Browser Service SQL Database listens only on port 1433. Therefore, the SQL Server Browser Service which listens on various other ports is naturally unsupported.
  • Change Data Capture (CDC) This SQL Server feature monitors changes to a database, and captures all activity to change tables. CDC relies on a SQL Server Agent job to function, and is unsupported in SQL Database.
  • Common Language Runtime (CLR) The SQL Server CLR features (often referred to simply as SQL CLR) allow you to write stored procedures, triggers, functions, and user-defined types in any .NET language (such as C# or VB), as an alternative to using traditional T-SQL. In SQL Database, only T-SQL can be used; SQL CLR is not supported.
  • Compression SQL Database does not support the data compression features found in SQL Server, which allow you to compress tables and indexes.
  • Database object naming convention In SQL Server, multipart names can be used to reference a database object in another schema (with the two-part name syntax schema.object), in another database (with the three-part name syntax database.schema.object), and (if you configure a linked server) on another server (with the four-part name syntax server.database.schema.object). In SQL Database, two-part names can also be used to reference objects in different schemas. However, three-part names are limited to reference only temporary objects in tempdb (that is, where the database name is tempdb and the object name starts with a # symbol); you cannot access other databases on the server. And you cannot reference other servers at all, so four-part names can never be used.
  • Extended Events In SQL Server, you can create extended event sessions help to troubleshooting a variety of problems, such as excessive CPU usage, memory pressure, and deadlocks. This feature is not supported in SQL Database.
  • Extended Stored Procedures You cannot execute your own extended stored procedures (these are typically custom coded procedures written in C or C++) with SQL Database. Only conventional T-SQL stored procedures are supported.
  • File Streaming SQL Server native file streaming features, including FILESTREAM and FileTable, are not supported in SQL Database. You can instead consider using Windows Azure Blob Storage containers for unstructured data files, but it will be your job at the application level to establish and maintain references between SQL Database and the files in blob storage, though note that there will be no transactional integrity between them using this approach.
  • Full-Text Searching (FTS) The FTS service in SQL Server that enables proximity searching and querying of unstructured documents is not supported in SQL Database.
  • Mirroring SQL Database does not support database mirroring, which is generally a non-issue because Microsoft is ensuring data redundancy with SQL Database so you don’t need to worry about disaster recovery. This does also mean that you cannot use SQL Database as a location for mirroring a principal SQL Server database running on-premises. However, if you wish to consider the cloud for this purpose, it is possible to host SQL Server inside a Windows Azure virtual machine (VM) against which you can mirror an on-premise principal database. This solution requires that you also implement a virtual private network (VPN) connection between your local network and the Windows Azure VM, although it will work even without the VPN if you use server certificates.
  • Partitioning SQL Server allows you to partition tables and indexes horizontally (by groups of rows) across multiple filegroups within a database, which greatly improves the performance of very large databases. SQL Database has a maximum database size of 150 GB and gives you no control over filegroups, and thus does not support table and index partitioning.
  • Replication SQL Server offers robust replication features for distributing and synchronizing data, including merge replication, snapshot replication, and transactional replication. None of these features are supported by SQL Database, however SQL Data Sync can be used to effectively implement merge replication between a SQL Database and any number of other SQL Databases on Windows Azure and on-premise SQL Server databases.
  • Resource Governor The Resource Governor feature in SQL Server lets you manage workloads and resources by specifying limits on the amount of CPU and memory that can be used to satisfy client requests. These are hardware concepts that do not apply to SQL Database, and so the Resource Governor is naturally unsupported.
  • Service Broker SQL Server Service Broker provides messaging and queuing features, and is not supported in SQL Database.
  • System stored procedures SQL Database supports only a few of the system stored procedures provided by SQL Server. The unsupported ones are typically related to those SQL Server features and behaviors not supported by SQL Database. At the same time, SQL Database provides a few new system stored procedures not found in SQL Server that are specific to SQL Database (for example, sp_set_firewall_rule).
  • Tables without a clustered index Every table in a SQL Database must define a clustered index. By default, SQL Database will create a clustered index over the table’s primary key column, but it won’t do so if you don’t define a primary key. Interestingly enough, SQL Database will actually let you create a table with no clustered index, but it will not allow any rows to be inserted until and unless a clustered index is defined for the table. This limitation does not exist in SQL Server.
  • Transparent Data Encryption (TDE) You cannot use TDE to encrypt a SQL Database like you can with SQL Server.
  • USE In SQL Database, the USE statement can only refer to the current database; it cannot be used to switch between databases as it can with SQL Server. Each SQL Database connection is tied to a single database, so to change databases, you must connect directly to the database.
  • XSD and XML indexing SQL Database fully supports the xml data type, as well as most of the rich XML support that SQL Server provides, including XML Query (XQuery), XML Path (XPath), and the FOR XML clause. However XML schema definitions (XSD) and XML indexes are not supported in SQL Database.
Follow

Get every new post delivered to your Inbox.

Join 59 other followers