Using Transparent Data Encryption in SQL Server 2008

Happy New Year! 

I’ve got a juicy SQL Server post for you to ring in the new year – Transparent Data Encryption, available only in SQL Server 2008 Enterprise edition. This is good and powerful stuff that works, folks, so check it out here.

Sensitive information (passwords, credit card numbers, salary information, and so on) in your database needs to be encrypted. As of SQL Server 2005, you can encrypt and decrypt sensitive data columns in your tables using symmetric keys. You identify the columns that will hold sensitive information, and then invoke ENCRYPTBYKEY to store data in those columns, and DECRYPTBYKEY to retrieve data from those columns. The process is fairly straightforward, but it does require programming in your application for each encrypted column.

In SQL Server 2008 (Enterprise edition only), Microsoft has added a new feature called Transparent Data Encryption (TDE). This feature automatically encrypts the entire database (data and log files), as well as database backups, without requiring any programming or code changes to your application. The process is entirely transparent, hence the name Transparent Data Encryption. In this blog post, I’ll explain TDE, and demonstrate how to use this new security feature.

(Note that the NTFS file system in Windows Server 2000 and later provides a feature called Encrypted File System [EFS]. This feature also applies transparent encryption to any data stored on the hard drive, but it will not protect databases or backups that have been copied onto a CD or other media. TDE in SQL Server 2008 is based on a certificate that is needed to decrypt or restore any encrypted database, regardless of where the data is transferred.)

When TDE is first enabled for a specific database, SQL Server encrypts the database in the background. During this process, the database remains online and responsive to client requests (similarly, when encryption is disabled, SQL Server decrypts the database in the background). Encryption is performed at the page level, and does not increase the size of the database in any way. Once the entire database is encrypted, new data gets encrypted on the fly as it is written to disk, and all data gets decrypted when read back.  

Multiple Protection Layers

Databases protected with TDE are encrypted with a Database Encryption Key (DEK). You create the DEK and store it in the database, but the DEK itself is associated with a certificate that you create separately in the master database. This means that a backup of the database includes the DEK, but doesn’t include the certificate upon which the DEK is based. Hence, TDE database backups are useless to prying eyes, since they cannot be restored without the certificate. Finally, the certificate itself is encrypted by the Service Master Key (SMK), also contained in the master database.

To get started, you’ll need to create an SMK, if your server doesn’t have one already. Then you can create a certificate for TDE that is encrypted by the SMK which can be used to create one encrypt one or more DEKs. Finally, you create a DEK against the certificate in each database to be encrypted and then enable encryption on the database.

The following diagram illustrates how TDE might be used to encrypt two databases on one server instance:

In this diagram, you can see that the master database holds the SMK (there can be one and only one SMK on any server instance). The master database also holds a certificate whose private key is encrypted by the SMK. The two databases MyDB1 and MyDB3 are each encrypted with DEKs that are, in turn, encrypted by the certificate. The DEKs are entirely dependent on the certificate, so copying or restoring these databases to another server instance without also transferring the certificate upon which the DEKs are based yields a totally unusable database.   

Creating a Service Master Key (SMK)

If your server already has an SMK, you can skip this step. An SMK can only be created in the master database, and there can only be one SMK per server instance. If you don’t already have an SMK, you can create one as follows:   

USE master
GO   

CREATE MASTER KEY
  ENCRYPTION BY PASSWORD = 'Hrd2GessP@$$w0rd!' 

Creating a TDE Certificate

In general, certificates can be created in any database. However, certificates used for TDE must be created in the master database. It should be fairly obvious why the certificates used to encrypt DEKs in each encrypted database are stored outside the encrypted database in master (you wouldn’t want them stored in the encrypted database, as that would defeat the whole protection scheme!)   

USE master
GO 

CREATE CERTIFICATE MyEncryptionCert
  WITH SUBJECT = 'My Encryption Certificate' 

You can then query the sys.certificates view to confirm that the certificate has been created, as follows:   

SELECT name, pvt_key_encryption_type_desc FROM sys.certificates
  WHERE name = 'MyEncryptionCert'

The output confirms that the certificate was created and that its private key is protected by the master key, as shown here:   

name                             pvt_key_encryption_type_desc
-------------------------------- ----------------------------------
MyEncryptionCert                 ENCRYPTED_BY_MASTER_KEY  

(1 row(s) affected) 

Creating a Database Encryption Key (DEK)

Each database to be encrypted requires its own DEK, and each database’s DEK is in turn encrypted by the TDE certificate we just created in the master database. When creating the DEK, you can specify a particular encryption algorithm to be used. Supported algorithms include AES_128, AES_192, AES_256, TRIPLE_DES_3KEY. The DEK protects not only the data and log files, but backups too. Attempting to restore an encrypted database without the certificate is an exercise in futility.   The following T-SQL code creates a DEK for the MyDB database that specifies 128-bit encryption:   

USE MyDB
GO  

CREATE DATABASE ENCRYPTION KEY
  WITH ALGORITHM = AES_128
  ENCRYPTION BY SERVER CERTIFICATE MyEncryptionCert

Notice the ENCRYPTION BY SERVER CERTIFICATE clause that references the TDE certificate MyEncryptionCert we just created in the master database. This means that the MyEncryptionCert certificate must be present and available in the master database of the same server instance as MyDB, or the database will be rendered inaccessible.

(Because we have not yet backed up the TDE certificate, SQL Server issues a warning at this time alerting you to the fact that the certificate being used to encrypt the DEK has not been backed up. This warning should be taken seriously, since you will not be able to access any database encrypted by the DEK without the certificate. Should the certificate be lost or damaged, your encrypted databases will be completely inaccessible. Later in this post, I will show you how to back up and restore the certificate.)

Enabling TDE

With the SMK, certificate, and DEK prepared, you can start transparent data encryption on the database using the ALTER DATABASE…SET ENCRYPTION ON statement. For example:   

ALTER DATABASE MyDB SET ENCRYPTION ON

That’s all there is to it! From this point forward, the database and all of its backups will be encrypted. If an unauthorized party somehow gains access to the physical media holding any backups of MyDB, the backups will be useless without the certificate protecting the DEK.   

Querying TDE Views

You can query the catalog view sys.databases to see which databases are protected by TDE. For example:   

SELECT name, is_encrypted FROM sys.databases

The query results show that MyDB is the only encrypted database on the server:   

name                           is_encrypted
------------------------------ ------------
master                         0
tempdb                         0
model                          0
msdb                           0
ReportServer                   0
ReportServerTempDB             0
MyDB                           1  

(7 row(s) affected)

This output is somewhat misleading, however, since encrypting one or more databases results in the encryption of tempdb as well. This is absolutely necessary since tempdb is shared by all databases, and SQL Server must therefore implicitly protect temporary storage placed into tempdb by databases encrypted by TDE. But because the encryption in tempdb is implicit, is_encrypted is returned as 0 (false) by sys.databases for tempdb (you’ll see next that SQL Server does actually create DEK for tempdb). This can have an undesirable performance impact for unencrypted databases on the same server instance. For this reason, you may wish to consider isolating separate SQL Server instances; one for encrypted databases and one for non-encrypted databases.

You can also query the dynamic management view sys.dm_database_encryption_keys to see all the DEKs and to monitor the progress of encryption (or decryption, when you disable TDE) running on background threads managed by SQL Server. This view returns the unique database ID that can be joined on sys.databases to see the actual database name. For example, if we run the following query after enabling TDE, we can obtain information about the DEK and background encryption process:   

SELECT
   dbs.name,
   keys.encryption_state,
   keys.percent_complete,
   keys.key_algorithm,
   keys.key_length
 FROM
   sys.dm_database_encryption_keys AS keys
   INNER JOIN sys.databases AS dbs ON keys.database_id = dbs.database_id

If this query is executed after we enable TDE but before SQL Server has completed encrypting the entire database in the background, we get results similar to the following:   

name       encryption_state percent_complete key_algorithm    key_length
---------- ---------------- ---------------- ---------------- -----------
tempdb     3                0                AES              256
MyDB       2                78.86916         AES              128   

(2 row(s) affected) 

The value returned by encryption_state tells you the current status of encryption (or decryption), as follows:

1 = Unencrypted
2 = Encryption in progress
3 = Encrypted
4 = Key change in progress
5 = Decryption in progress (after ALTER DATABASE…SET ENCRYPTION OFF)

Certain database operations cannot be performed during any of the “in progress” states (2, 4, or 5). These include enabling or disabling encryption, dropping or detaching the database, dropping a file from a file group, taking the database offline, or transitioning the database (or any of its file groups) to a READ ONLY state. Also note the implicit DEK for tempdb created by SQL Server, which always uses AES_256 encryption.  

Backing Up the Certificate

 It is extremely important to back up the server certificates you use to encrypt your databases with TDE. Without the certificate, you will not be able to access the encrypted database or restore encrypted database backups (which, of course, is the point of TDE). Attempting to restore an encrypted database without the certificate will fail with an error similar to this from SQL Server:   

Msg 33111, Level 16, State 3, Line 1
Cannot find server certificate with thumbprint '0x6B1FEEEE238847DE75D1850FA20D87CF94F71F33'.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally. 

Use the following statement to back up the server certificate to a file. In addition to the certificate itself, the certificate’s private key must also be saved to a file and protected with a password: 

BACKUP CERTIFICATE MyEncryptionCert TO FILE='C:\MyEncryptionCert.certbak'
 WITH PRIVATE KEY (
  FILE='C:\MyEncryptionCert.pkbak',
  ENCRYPTION BY PASSWORD='Pr!vK3yP@ssword')

This statement creates two files: MyEncryptionCert.certbak is a backup of the server certificate, and MyEncryptionCert.pkbak is a backup of the certificate’s private key protected with the password Pr!vK3yP@ssword. Password protection is absolutely required when backing up the certificate’s private key. Both of these files and the password will be needed to restore an encrypted database backup onto another server or instance. At the risk of stating the obvious, these backup files and the private key password should be closely safeguarded.

Restoring the Certificate

Before an encrypted database can be restored elsewhere, the server certificate that its DEK is encrypted by must be restored first. And if the target instance does not have a master key, one must be created for it before the server certificate can be restored, as shown here:

USE master

GO
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'An0thrHrd2GessP@ssw0rd!'

To restore the server certificate from the backup files we made earlier, use an alternative form of the CREATE CERTIFICATE statement, as follows:

CREATE CERTIFICATE MyEncryptionCert
 FROM FILE='C:\MyEncryptionCert.certbak'
 WITH PRIVATE KEY(
  FILE='C:\MyEncryptionCert.pkbak',
  DECRYPTION BY PASSWORD='Pr!vK3yP@ssw0rd')

This statement restores the MyEncryptionCert server certificate from the certificate backup file MyEncryptionCert.certbak and the certificate’s private key backup file MyEncryptionCert.pkbak. Naturally, the password provided in the DECRYPTION BY PASSWORD clause must match the one that was used when the certificate’s private key was backed up or the certificate will fail to restore. With a successfully restored certificate, you can then restore the backup of any encrypted database whose DEK is based on the MyEncryptionCert certificate.   

Summary

With the growing concern about personal data protection and the proliferation of computer viruses, developing a methodology for secure computing continues to be a vital task for developers. With support for Transparent Data Encryption in SQL Server 2008, you can easily implement an additional layer of security by encrypting your entire database without making any code changes in your application. Now go give it a try, and enjoy!

SQL Server 2008 Table-Valued Parameters and C# Custom Iterators: A Match Made In Heaven!

The new Table-Valued Parameter (TVP) feature is perhaps the most significant T-SQL enhancement in SQL Server 2008. Many people have already discovered how to use this feature for passing entire sets of rows with a typed schema definition along from one stored procedure to another on the server. This is something that has not been previously possible using either table variables or temp tables, although temp tables can of course be shared between multiple stored procedures executing on the same connection. Conveniently, TVPs can be passed freely between stored procedures and user-defined functions (UDFs) on the server.

A less widely-known capability of TVPs—yet arguably their most compelling facet—is the ability to marshal an entire set of rows across the network, from your ADO.NET client to your SQL Server 2008 database, with a single stored procedure call (only one round-trip) that accepts a single table-valued parameter. Developers have resorted to a variety of clever hacks over the years to reduce multiple round-trips for processing multiple rows—including XML, delimited text, or even (gasp) accepting hundreds (up to 2100!) of parameters. But special logic then needs to be implemented for packaging and unpackaging the parameter values on both sides of the wire. Worse, the code to implement that logic is often gnarly, and tends to impair developer productivity. None of those techniques even come close to the elegance and simplicity of using TVPs as a native solution to this problem.

I won’t cover the basics of TVPs, since many folks know them already and there are numerous blog posts already on the Web that explain the basics (not to mention Chapter 2 in my book) What I will show is how to implement TVPs in ADO.NET client code for passing an entire DataTable object (all its rows and columns) to SQL Server with one stored procedure call.

If you’re a business-object person and not a DataSet/DataTable person, I’ll also show you how to do the same thing with an entire business object collection using C# custom iterators. There are few places (if any) other than this blog post that show the necessary steps to implement TVPs against business object collections (it’s not even covered in my book!). So let’s get started.

Preparing the SQL Server Database

Before diving into the C# code, we need to set up our database for a simple Order Entry scenario. That means creating an Order table and a related OrderDetail table. We’ll also need an OrderUdt and OrderDetailUdt user-defined table type (UDTT) to base our TVPs on, and a single InsertOrders stored procedure that accepts header and detail TVPs containing typed rows to be bulk inserted into the underlying tables:

CREATE TABLE [Order](
 OrderId int NOT NULL,
 CustomerId int NOT NULL,
 OrderedAt date NOT NULL,
 CreatedAt datetime2(0) NOT NULL,
  CONSTRAINT PK_Order PRIMARY KEY CLUSTERED (OrderId ASC))
GO

CREATE TABLE [OrderDetail](
 OrderId int NOT NULL,
 LineNumber int NOT NULL,
 ProductId int NOT NULL,
 Quantity int NOT NULL,
 Price money NOT NULL,
 CreatedAt datetime2(0) NOT NULL,
  CONSTRAINT PK_OrderDetail PRIMARY KEY CLUSTERED (OrderId ASC, LineNumber ASC))
GO

CREATE TYPE OrderUdt AS TABLE(
 OrderId int,
 CustomerId int,
 OrderedAt date)
GO

CREATE TYPE OrderDetailUdt AS TABLE(
 OrderId int,
 LineNumber int,
 ProductId int,
 Quantity int,
 Price money)
GO

CREATE PROCEDURE InsertOrders(
 @OrderHeaders AS OrderUdt READONLY,
 @OrderDetails AS OrderDetailUdt READONLY)
AS
 BEGIN

    -- Bulk insert order header rows from TVP
    INSERT INTO [Order]
     SELECT *, SYSDATETIME() FROM @OrderHeaders

    -- Bulk insert order detail rows from TVP
    INSERT INTO [OrderDetail]
     SELECT *, SYSDATETIME() FROM @OrderDetails

 END
GO

Notice how the schemas of the UDTTs and the actual tables themselves are almost, but not quite, identical. The CreatedAt column in the two tables is not present in the UDTTs, since we won’t be accepting the client’s date and time—as we shouldn’t trust it, not knowing the time zone, not being in sync with the server and other clients, etc. So we’ll accept all required column values for orders and order details from the client, except for CreatedAt, for which our stored procedure will call the SYSDATETIME function to provide the current date and time based on the server clock (just like the GETDATE function, but returns as a datetime2 data type rather than a datetime data type). Independent of CreatedAt, the header table has an OrderedAt column which will be accepted from the client as a date value (the “time-less” date and enhanced datetime2 are new data types in SQL Server 2008 that I’ll cover in a future post).

Passing a DataTable to SQL Server

The easiest way to call this stored procedure from a .NET client is to use a DataTable. You simply pass the entire populated DataTable object as the parameter value and set the parameter’s SqlDbType property to SqlDbType.Structured. Everything else is plain old vanilla ADO.NET:

var headers = new DataTable();
headers.Columns.Add("OrderId", typeof(int));
headers.Columns.Add("CustomerId", typeof(int));
headers.Columns.Add("OrderedAt", typeof(DateTime));

var details = new DataTable();
details.Columns.Add("OrderId", typeof(int));
details.Columns.Add("LineNumber", typeof(int));
details.Columns.Add("ProductId", typeof(int));
details.Columns.Add("Quantity", typeof(decimal));
details.Columns.Add("Price", typeof(int));

headers.Rows.Add(new object[] { 6, 51, DateTime.Today });
details.Rows.Add(new object[] { 6, 1, 12, 2, 15.95m });
details.Rows.Add(new object[] { 6, 2, 57, 1, 59.99m });
details.Rows.Add(new object[] { 6, 3, 36, 10, 8.50m });

headers.Rows.Add(new object[] { 7, 51, DateTime.Today });
details.Rows.Add(new object[] { 7, 1, 23, 2, 79.50m });
details.Rows.Add(new object[] { 7, 2, 78, 1, 3.25m });

using (var conn = new SqlConnection("Data Source=.;Initial Catalog=MyDb;Integrated Security=True;"))
{
  conn.Open();
  using (var cmd = new SqlCommand("InsertOrders", conn))
  {
    cmd.CommandType = CommandType.StoredProcedure;

    var headersParam = cmd.Parameters.AddWithValue("@OrderHeaders", headers);
    var detailsParam = cmd.Parameters.AddWithValue("@OrderDetails", details);

    headersParam.SqlDbType = SqlDbType.Structured;
    detailsParam.SqlDbType = SqlDbType.Structured;

    cmd.ExecuteNonQuery();
  }
  conn.Close();
}

The magic here is that the one line of code that invokes ExecuteNonQuery sends all the data for two orders with five details from the client over to the InsertOrders stored procedure on the server. The data is structured as a strongly-typed set of rows on the client, and winds up as a strongly-typed set of rows on the server in a single round-trip, without any extra work on our part.

This example passes a generic DataTable object to a TVP, but you can also pass a strongly typed DataTable or even a DbDataReader object (that is connected to another data source) to a TVP in exactly the same way. The only requirement is that the columns of the DataTable or DbDataReader correspond to the columns defined for the UDTT that the TVP is declared as.

So TVPs totally rock, eh? But hold on, the TVP story gets even better…

Passing a Collection of Objects to SQL Server

What if you’re working with collections populated with business objects rather than DataTable objects populated with DataRow objects? You might not think at first that business objects and TVPs could work together, but the fact is that they can… and quite gracefully too. All you need to do is implement the IEnumerable<SqlDataRecord> interface in your collection class. This interface requires your collection class to supply a C# custom iterator (sorry, VB .NET doesn’t support custom iterators) method named GetEnumerator which ADO.NET will call for each object contained in the collection when you invoke ExecuteNonQuery. Here are the detailed steps:

First we’ll define the OrderHeader and OrderDetail classes and properties, similar to the way we created the DataTable objects and columns before:

public class OrderHeader
{
  public int OrderId { get; set; }
  public int CustomerId { get; set; }
  public DateTime OrderedAt { get; set; }
}

public class OrderDetail
{
  public int OrderId { get; set; }
  public int LineNumber { get; set; }
  public int ProductId { get; set; }
  public int Quantity { get; set; }
  public decimal Price { get; set; }
}

Ordinarily, List<OrderHeader> and List<OrderDetail> objects might be suitable for serving as collections of OrderHeader and OrderDetail objects in our application. But these collections they won’t suffice on their own as input values for TVPs because List<T> doesn’t implement IEnumerable<SqlDataRecord>. We need to add that ourselves. So we’ll define OrderHeaderCollection and OrderDetailCollection classes that inherit List<OrderHeader> and List<OrderDetail> respectively, and also implement IEnumerable<SqlDataRecord> to “TVP-enable” them:

public class OrderHeaderCollection : List<OrderHeader>, IEnumerable<SqlDataRecord>
{
  IEnumerator<SqlDataRecord> IEnumerable<SqlDataRecord>.GetEnumerator()
  {
    var sdr = new SqlDataRecord(
     new SqlMetaData("OrderId", SqlDbType.Int),
     new SqlMetaData("CustomerId", SqlDbType.Int),
     new SqlMetaData("OrderedAt", SqlDbType.Date));

    foreach (OrderHeader oh in this)
    {
      sdr.SetInt32(0, oh.OrderId);
      sdr.SetInt32(1, oh.CustomerId);
      sdr.SetDateTime(2, oh.OrderedAt);

      yield return sdr;
    }
  }
}

public class OrderDetailCollection : List<OrderDetail>, IEnumerable<SqlDataRecord>
{
  IEnumerator<SqlDataRecord> IEnumerable<SqlDataRecord>.GetEnumerator()
  {
    var sdr = new SqlDataRecord(
     new SqlMetaData("OrderId", SqlDbType.Int),
     new SqlMetaData("LineNumber", SqlDbType.Int),
     new SqlMetaData("ProductId", SqlDbType.Int),
     new SqlMetaData("Quantity", SqlDbType.Int),
     new SqlMetaData("Price", SqlDbType.Money));

    foreach (OrderDetail od in this)
    {
      sdr.SetInt32(0, od.OrderId);
      sdr.SetInt32(1, od.LineNumber);
      sdr.SetInt32(2, od.ProductId);
      sdr.SetInt32(3, od.Quantity);
      sdr.SetDecimal(4, od.Price);

      yield return sdr;
    }
  }
}

I’ll only explain the OrderHeaderCollection class; you can then infer how the OrderDetailCollection class–or any of your own collection classes–implements the custom iterator needed to support TVPs.

First, again, it inherits List<OrderHeader>, so an OrderHeaderCollection object is everything that a List<OrderHeader> object is. This means implicitly, by the way, that it also implements IEnumerable<OrderHeader>, which is what makes any sequence “foreach-able” or “LINQ-able”. But in addition, it explicitly implements IEnumerable<SqlDataRecord> which means it also has a customer iterator method for ADO.NET to consume when an instance of this collection class is assigned to a SqlDbType.Structured parameter for piping over to SQL Server with a TVP.

Every enumerable class requires a matching enumerator method, so not surprisingly implementing IEnumerable<SqlDataRecord> requires providing a GetEnumerator method that returns an IEnumerator<SqlDataRecord>. This method first initializes a new SqlDataRecord object with a schema that matches the UDTTs that the TVPs are declared as. It then enters a loop that iterates all the elements in the collection (possible because List<OrderHeader> implicitly implements IEnumerable<OrderHeader>). On the first iteration, it sets the column property values of the SqlDataRecord object to the property values of the first OrderHeader element, and then issues the magic yield return statement. By definition, any method (like this one) which returns IEnumerator<T> and has a yield return statement in it, is a custom iterator method that is expected to return a sequence of T objects until the method execution path completes (in this case, when the foreach loop finishes).

The crux of this is that we are never calling this method directly. Instead, when we invoke ExecuteNonQuery to run a stored procedure with a SqlDbType.Structured parameter (that is, a TVP), ADO.NET expects the collection passed for the parameter value to implement IEnumerable<SqlDataRecord> so that IEnumerable<SqlDataRecord>.GetEnumerator can be called internally to fetch each new record for piping over to the server. When the first element is fetched from the collection, GetEnumerator is entered, the SqlDataRecord is initialized and is then populated with values using the SetInt32 and SetDateTime methods (there’s a SetXXX method for each data type). That SqlDataRecord “row” is then pushed into the pipeline to the server by yield return. When the next element is fetched from the collection, the GetEnumerator method resumes from the point that it yield returned the previous element, rather than entering GetEnumerator again from the top. This means the SqlDataRecord gets initialized with schema information only once, while its population with one element after another is orchestrated by the controlling ADO.NET code for ExecuteNonQuery that actually ships one SqlDataRecord after another to the server..

The actual ADO.NET code is 100% identical to the code at the top of the post that works with DataTable objects. Substituting a collection for a DataTable object requires no code changes and works flawlessly, provided the collection implements IEnumerator<SqlDataRecord> and provides a GetEnumerator method that maps each object instance to a SqlDataRecord and each object property to a column defined by the UDTT that the TVP is declared as.

How freaking c-o-o-l is that?