Always Encrypted = Never Decrypted (Except on the Client)

Security is, was, and always will be, of paramount concern to software developers. Ideally, we build our applications with multiple layers of security, and typically, encryption plays a key role in our efforts to protect sensitive data (passwords, credit card numbers, etc.) against unauthorized access. With the advent of cloud computing, security concerns become even more pressing, because we need to draw a clear separation between those who own our data (us, the client), from those who manage it (our cloud service provider). This is where Always Encrypted – a new security feature in SQL Server 2016 – comes into play.

Older Encryption Features

Before diving in to Always Encrypted, let’s first establish some context by reviewing existing encryption features that have been available in earlier versions of SQL Server. For one, column level encryption has been available since SQL Server 2005. This type of encryption uses either certificates or symmetric keys such that, without the required certificate or key, encrypted data remains hidden from view.

As of SQL Server 2008 (Enterprise Edition), you can also use Transparent Data Encryption (TDE) to encrypt the entire database – again, using a special certificate and database encryption key – such that without the required certificate and key, the entire database remains encrypted and completely inaccessible. A really nice feature of TDE is that any backup files made of the database are also rendered unusable without the TDE certificate and database encryption key.

From a security standpoint, these older features do serve us well. However, they suffer from two significant drawbacks. First, the very certificates and keys used for encryption are themselves stored in the database (or database server), which means that the database is always capable of decrypting the data (as long as the required certificates and keys are present). While this may have been perfectly fine some years back, it becomes a major problem today if you are interested in moving their data to the cloud. By giving up physical ownership of the database, you are handing over the encryption keys and certificates to your cloud provider, empowering them to access your private data.

Another issue is the fact that these older features only encrypt data “at rest” – meaning, data is only encrypted when it is being written to disk. So even if, for example, SSL is used to encrypt a credit card number between the client browser and the web server, it then gets sent in clear text over the wire from the web server to the database server, where it ultimately gets encrypted once more in database storage. And the same is true in the opposite direction, when reading back the credit card. So, between the web server and the database server, a hacker could sniff out your sensitive data. Again, this was perhaps a lesser concern years ago, but nowadays customers also want to move their applications and virtual networks to the cloud as well, not just their databases. Yet migrating to the cloud also means handing over the sensitive network connection between the web and database servers, that used to be in your control when that infrastructure was hosted in your own data centers.

And so now, in SQL Server 2016 (all editions), we have Always Encrypted, where “always” really means all the time. So data is encrypted not just at rest, but also in flight. Furthermore, the encryption keys themselves – which are essential for both encrypting and decrypting – are not stored in the database. Those keys stay with you, the client. And that’s why, as you’ll see, Always Encrypted is a hybrid feature, where all the encryption and decryption occurs exclusively on the client side.

With this approach, you effectively separate the clients who own the data, from the cloud providers who manage it. Because the data is always encrypted, SQL Server is powerless to decrypt it. All it can do is serve it up in its encrypted state, so that the data is also encrypted in flight. Only when it arrives at the client can it be decrypted, on the client, by the client, who possesses the necessary keys. Likewise, when inserting or updating new data, that data gets encrypted immediately on the client, before it ever leaves the client, and remains encrypted in-flight all the way to the database server, where SQL Server can only store it in that encrypted state – it cannot decrypt it. This is classic GIGO (garbage in, garbage out), as far as SQL Server is concerned.

Enabling Always Encrypted is a bit of a process, and one which is made much easier using the Always Encrypted Wizard in SQL Server Management Studio. As you’ll see, the wizard really streamlines the whole process, and makes it virtually painless to take an existing, unencrypted table, and convert it into a table with always encrypted columns in it.

Encryption Types

Always Encrypted supports two types of encryption: randomized and deterministic.

Randomized encryption is the most secure, because it randomly and unpredictably generates completely different results for the same value. However, this randomness means that there’s no equality support, because the same data always has a different encrypted value. So you’ll never be able to query on a column using randomized encryption, nor would you be able to join tables on this column, or group, or index, this column, or perform any other operation that requires a deterministic equality test.

For that, you’ll use deterministic encryption, which always generates the same encrypted result for the same data, repeatedly and predictably. This gives you the equality support you need to query on it, index on it, and so on, but is less secure because a hacker could start analyzing patterns to figure out how to decrypt your data. The risk increases significantly when working with small sets of values, particularly Boolean (bit) values like True/False, or Male/Female, because the hacker will only see two distinct values in a bit column that uses deterministic encryption. However, you should be aware that range searching will not be possible even with deterministic encryption, because it’s impossible to determine sequence of values, even if the encrypted value itself is deterministic.

Encryption Keys

Always Encrypted uses a key to encrypt one or more columns in a table; and thus, we call this key the column encryption key, or CEK. The CEK is used on the client to encrypt and decrypt values in whichever column – or columns – you apply the CEK to. So if you have multiple columns you want to encrypt, you could either create a single CEK and apply it to all the columns, or you could create one CEK for each column, or you could mix it up, and apply one or more CEKs to one or more columns in any combination you’d like. Remember that the CEK is used on the client – it is never exposed in the database, which is what keeps it undecryptable (if that’s a word) at the database level. However, the database does store encrypted versions of the CEKs.

So the columns get encrypted by the CEKs, and the CEKs themselves get encrypted by the CMK, or column master key. And only the client possesses the CMK, so the database can’t do anything with the encrypted CEKs other than give them to the client, who can then use the CMK to decrypt the CEKs, and then use the decrypted CEKs to encrypt and decrypt data.

Always Encrypted requires that the client-side CMK be stored in a trusted location, and supports several options for this, including Azure Key Vault – which is a secure, cloud-based repository, local certificate stores, and hardware security modules. We’ll be using the local certificate store to contain our CMK in the following walkthrough.

Always Encrypted Workflow

Let’s start with a simple Customer table with a few columns that we’d like to protect with Always Encrypted:

image1

Here you see the table with three columns, and we’d like to encrypt the Name and social security number columns. In SQL Server Management Studio, the Always Encrypted wizard creates a column encryption key (CEK) for us, which will be used to encrypt and decrypt the Name and SSN columns on the client (again, multiple CEKs are supported, but not with the wizard). It also creates a column master key (CMK).

image2

Remember that the CMK must be stored in a trusted location, and the wizard supports either the local certificate store or Azure Key Vault.

image3

Once the CMK is stored away (client-side) the wizard uses it to encrypt the CEK, and then stores the encrypted CEK in the database. To be clear, the database receives only the encrypted CEK, which it can’t do anything with given the fact that it doesn’t also receive the CMK. Instead, it receives the client-side location of the CMK. Thus, the database has all the metadata that the client needs for encryption. The client can therefore connect to the database and discover all the CEKs, as well as the path to where the CMK is stored in either the certificate store or Azure Key Vault:

image4

Next, the wizard begins the encryption process, which is to say that it creates a new temporary table, transfers data into it from the Customer table, and encrypts the data on the fly. Again, the encryption occurs client-side, within SSMS, and when it’s finished, the wizard drops the Customer table, and renames the temporary table to Customer:

image5

At this point, we’re done with the wizard and with SSMS. And this is what we’re left with: a database with encrypted data that is absolutely powerless to decrypt itself. Because you need the CEK to decrypt, but the database only has an encrypted CEK. And you need the CMK to decrypt the CEK, but the database only has a path to the CMK (accessible just to the client), not the CMK itself:

image6

So now it’s time to work with this data from your client application. Say you need to retrieve a customer name by querying on the customer’s social security number. Here’s your query, which references the Name column in the SELECT, and the SSN column in the WHERE clause, and both of these columns are encrypted in the database:

image7

Your application passes this query on to ADO.NET, which would normally just pass it on as-is to SQL Server. But that won’t work now because we need to encrypt the social security number first, on the client, before issuing the query. This is going to involve a bit of a conversation between SQL Server and the client, and this conversation gets initiated by including a special setting in the database connection string – Column Encryption Setting=Enabled. With this setting in place, SQL Server will send the encrypted CEK back to the client, and will also send the path to the CMK back to the client. Then the client retrieves the CMK from the trusted location (the local certificate store, or Azure Key Vault) and uses it to decrypt the CEK.

Now the client has a CEK that it can use for encryption and decryption. So it can translate the query and send it out over the wire looking like this:

image8

That is how the query appears “in-flight”, and anyone hacking the wire will only get this encrypted view of the social security number (called the cipher-text). This can work only if we are using deterministic encryption for the social security number, so the same social security number will always encrypt to the same cipher-text value, making it possible to query on it.

image9

What about the Name column? Well, the encrypted name is being returned here in the query result, but we could be using either randomized or deterministic encryption for this column. In this scenario, we can say that we only query on the social security number, but never on the name, in which case we’d prefer to use randomized encryption for the name, which is less predictable and thus more secure than deterministic encryption.

image10

The cipher-text in the query result, whether it’s been randomly or deterministically encrypted, is what gets sent back to ADO.NET on the client, so that on the return trip back, the name in the result is still encrypted. Only when the result arrives at the client does ADO.NET use the CEK to decrypt it, like so:

image11

Encrypting an Existing Table

Let’s see all this work on a real table. I’ll create the Customer table with a few columns, and then populate the table with a couple of rows.

image12

You can see that nothing is encrypted yet, and all the data is clearly visible:

image13image14

I’m also able to query on the Name column, which is case-insensitive:

image15

image16

I can also run a case-insensitive query on the SSN column.

image17

image18

And of course, I can also run a range query on the SSN column to get all those customers with SSN numbers that start with 5 or greater, and alphabetically, this includes the string n/a.

image19

image20

So now let’s use the wizard to encrypt the name and SSN column with Always Encrypted. I’ll right-click on the database, and choose Tasks > Encrypt Columns… to fire up the Always Encrypted wizard.

image21

Click Next on the introduction page, and now we can select columns for encryption.

image22

First, we’ll encrypt the Name column, and we’ll choose Randomized encryption. We have no requirements to ever query on the name, so we’ll want to choose the more secure encryption type. We’ll also let the wizard create a new column encryption key for the Name column

We’ll also encrypt the SSN column, but this time we’ll choose Deterministic, because we want to be able to query on a customer’s social security number. So it’s necessary to ensure that a given social security number always encrypts to the same cipher-text.

image23

Now what about those warning symbols that appear under State? When I hover, the wizard explains that the collation for these columns is being changed from the normal, case-insensitive (CI) collation, to a binary collation (BIN2), which is case-sensitive. This impacts sorting and querying against the encrypted column, as you’ll see shortly.

image24

Now I’ll click Next, and we can configure the column master key. As with the column encryption key, we’ll let the wizard create a new CMK, and save it to the local certificate store. When you use the local certificate store, it means that the CMK must be stored in a certificate on the local machine.

image25

We have only two rows in our table, so this operation is going to be very quick. But large tables with many rows may take some time to process, and you might not be ready to run that process at the same time that you are running the wizard. Remember that this process involves copying the data from the unencrypted table to a new, temporary table – encrypting the data on-the-fly – and then swapping out the old table for the new one. You don’t want to interrupt this process while it’s running, so the next wizard page gives you the option of running the process now, or saving a PowerShell script that you can run later, at a time that’s more convenient.

image26

We’ll allow the process to run right now, so I’ll click Next once more to reach the Summary page

image27

Here, a quick review shows that the wizard will create a new column master key (CMK) that will be used to encrypt a new column encryption key (CEK), and that the new CMK will be saved to the local machine certificate store. We can also see that the CEK will be used to encrypt the Name and SSN columns, with the Name column using randomized encryption, and the SSN column using deterministic encryption.

Finally, I’ll click Finish to let it run.

In the Certificates console, you can see that a new Always Encrypted certificate was created automatically. This is, in essence, our CMK. If I drill into the certificate details, you can see that the hashing algorithm used by the CMK is SHA256.

image28

Back in SSMS, let’s see how our table has been altered by the actions of the wizard. I’ll right-click the Customer table, and script it out to a new query window:

image29

The CustomerId and City columns look exactly as we defined them, but there’s some new syntax here on the Name and SSN columns. Their datatypes haven’t changed – they are still varchar strings – but the collation has been changed to binary, and ENCRYPTED WITH specifies the CEKs used to encrypt these columns, the encryption type (randomized or deterministic) the encryption algorithm itself.

Now let’s query the table, and, as you might have expected, we only see cipher-text for the Name and SSN columns, because they are encrypted.

image30

image31

Furthermore, we can’t run any queries against either of the encrypted columns.

image32

image33

Now let’s query the system catalog views to discover how our database is using Always Encrypted.

image34

image35

First, we see a master key reference, indicating that the CMK can be found in a certificate store via a local machine path. We also see a column encryption key reference, and the value of that CEK, which is encrypted by the CMK. Again, the unencrypted CEK – which is needed to encrypt and decrypt – is not contained anywhere in the database.

If we query on sys.columns joined with sys.column_encryption_keys, we can see the new metadata columns indicating that the Name and SSN columns are both protected by CEK_Auto1, and that Name is using randomized encryption, while SSN is using deterministic encryption:

image36

image37

Querying and Storing Encrypted Data

So how can we decrypt this data and read it?
Remember, it’s the client that does this, and the client will decrypt it for us if we just connect to the database using the special setting Column Encryption Setting=Enabled.
In this case, the client is SSMS, so just right-click in the query window and choose Connection > Change Connection. Then, under Additional Connection Parameters, plug in column encryption setting=enabled:

image38

This action, of course, resets our connection, so I’ll first switch from master back to our encrypted database, and then query the Customer table once more. And now – like magic – we see the data fully decrypted:

image39
image40

Here’s what’s going on. When we reconnected with the special column encryption setting, that initiated a conversation between the client (SSMS in this case) and SQL Server. SQL Server has passed the CEK (encrypted by the CMK) to the client, as well as the location of the CMK, which is our local certificate store in this case. SSMS has then gone ahead and retrieved the CMK from the certificate store, and used it to decrypt the CEK. At that point, SQL Server returned encrypted data as before, but this time the client (again, SSMS) used the CEK to decrypt the results.

However, in SSMS we are still unable to query on the encrypted columns and we also cannot insert new data for the encrypted columns.

image41

image42

This is because these actions need to be parameterized by an ADO.NET client. To demonstrate, we’ll execute some ADO.NET client code. In Visual Studio, I’ll run some code where I’ve appended column encryption setting=enabled to our connection string, and then I’ll issue the same simple SELECT statement against the Customer. And we can clearly see the name and SSN columns, which have been decrypted.

image43

What about querying directly on the encrypted columns? Well, this should be possible, but with restrictions.

First, because we’re using randomized encryption for the Name column, you can simply never query on it. This is a limitation that we accept in return for the more secure type of encryption, which produces unpredictable values.

image45

image46

The SSN column uses deterministic encryption, making it possible to query against it, but only as a case-sensitive equality query. So when we query for the number of customers where the SSN is n/a, we get a count of 1 for the 1 customer SSN that matches. However, the lowercase n/a SSN will not get matched if we query for N/A in uppercase.

image47

image48

And we’re also restricted to an equality compare; so greater-than, less-than, and BETWEEN queries are not permitted. This makes sense of course, because even though the encryption is deterministic, which produces consistent and predictable results, those encrypted results do not retain the sequence of their unencrypted values.

image49

Now let’s try the INSERT again. This failed in SSMS, but it should work this time, because we’ve set up a parameterized SqlCommand object. So I’ll prepare the command object with the INSERT statement, and then populate the parameters collection with the values for the new name, SSN, and City – where we expect the client to encrypt the SSN and City columns before sending out the command.

image50

Sure enough, we don’t fail this time, and the new customer row gets added successfully:

image51

Returning back to SSMS, querying the table now shows the newly inserted row, along with the earlier three rows, and the data appears decrypted because we’re still running on the SSMS connection with the column encryption setting:

image52

Limitations and Considerations

Before concluding, it’s worth taking a moment to think about some issues you may run into using Always Encrypted. This is a great new technology, but it does have some limitations

First, there is no support for any of the large object and other specialized data types, which cannot be encrypted. There are also other notable incompatibilities with Always Encrypted to point out. Most of these aren’t too onerous, but notably you can’t use Always Encrypted with the new Temporal or Stretch features in SQL Server 2016.
Entity Framework was certainly not designed to work with Always Encrypted, but it is possible. If you’re building applications with EF6 against Always Encrypted tables, then check out this article here which talks you through the various workarounds needed for these two technologies to get along:

http://blogs.msdn.com/b/sqlsecurity/archive/2015/08/27/using-always-encrypted-with-entity-framework-6.aspx

You should also always remember that encryption and decryption always happen on the client, and this translates into additional management to ensure that clients can access the CMK. If you’re using the local machine certificate store, this means the additional administrative burden of deploying certificates to all the clients that need access to encrypted data.

And, there are several other restrictions, for which Aaron Bertrand has done an excellent job articulating in his blog post here, at this link:

http://blogs.sqlsentry.com/aaronbertrand/t-sql-tuesday-69-always-encrypted-limitations/

Conclusion

This post began by reviewing encryption capabilities available in earlier versions of SQL Server, which store keys and certificates on the server, and which encrypts data only at rest

Then I introduced Always Encrypted, a new SQL Server 2016 feature in which the client – not the server – handles encryption and decryption. This makes it a very attractive feature for those customer’s that remain reticent to move their database to the cloud, because they don’t want to move their encryption keys to the cloud as well.

We discussed the two encryption types, where randomized provides more secure encryption based on unpredictable values. Such unpredictability means that you can never query on the encrypted data, and that’s why we also have deterministic encryption. This produces the same encrypted values every time, making it more predictable and les secure, but this also makes it possible to query on the encrypted data, with some restrictions. You learned how the CMK is used to encrypt each CEK, which in turn is used to encrypt the actual data in your tables.

 

Advertisements

Working with Temporal Tables in SQL Server 2016 (Part 2)

In my previous post, I introduced the concept of temporal data, and explained at a high level how SQL Server 2016 implements temporal tables. This post dives into the details of exactly how you create and query temporal tables.

Let’s start with an ordinary table, and convert it into a temporal table. So I’ll create the Employee table, and load it up with some data.

Temporal01

Temporal02

To convert this into a temporal table, first I’ll add the two period columns and then I’ll enable temporal and set dbo.EmployeeHistory as the name of the history table.

Temporal03

Note that because we’re converting an existing table, this must be done in two separate ALTER TABLE statements. For a new temporal table, you can create it and enable it with a single CREATE TABLE statement. Also, and because this is an existing table with existing data, it’s necessary to set DEFAULT values that initialize the period columns with beginning-of-time (1900-01-01 00:00:00.0000000) until end-of-time (9999-12-31 23:59:59.9999999) values, which is not be necessary when creating a new table, or altering an existing table with no rows.

Now when we expand to see the Employee table in the Object Explorer, you can see that it is being designated as a System-Versioned table, which is the official name for temporal tables in SQL Server 2016. It also has a special icon showing a clock over the table, and this tells you that you’re looking at a temporal table. And nested directly beneath, is the history table, dbo.EmployeeHistory, which SQL Server has created with an identical schema to match dbo.Employee.

Temporal04

To effectively demonstrate how to query temporal tables, we’ll need to change some data first. You can see the current state of the data in the Employee table just as we inserted it, while the history table is completely empty because we haven’t made any changes yet:

Temporal05

Temporal06

So let’s update and delete a few rows. I’ll update the same row for EmployeeID number 5 three times, with a 5 or 6 second pause in between each one. First I’ll change the employee name to Gabriel, then I’ll change the department name to Support, and then I’ll change the department name once more to Executive. And, I’ll also delete EmployeeID number 8:

Temporal07

Now I’ll query the tables again:

Temporal08

The Employee table shows the current state just as you’d expect, with EmployeeID number 5 showing the result of the latest UPDATE, and EmployeeID number 8 being deleted. But if you examine the history table, you can see the three earlier versions of employee ID number 5, corresponding with each of the three updates we ran against that row. And we also see EmployeeID number 8, which we deleted.

Let’s have a closer look at the period columns. In the main table, you can see that every row except for EmployeeID number five has period column values that span from the beginning of time until the end of time. But EmployeeID number five has a StartDate indicating that this version of the row began on August 14 at about 5:35pm. This tells you that there are earlier versions of this row available in the history table.

Temporal09

And indeed, the earlier versions are visible here in the history table. Specifically, the previous version of this row is the one that ends at the same time that the current version begins, August 14th at 5:35 and 25 seconds, which is when we changed the DepartmentName from Support to Executive. This ending datetime2 value of 25 seconds past the minute matches exactly with the starting datetime2 value in the current row:

Temporal10

The previous version also was also replaced by an even earlier version about five seconds earlier, at 20 seconds past the minute, when we changed the department name from Engineering to Support:

Temporal11

And there’s one earlier version of the row from about 6 seconds earlier, at 14 seconds past the minute, which is when we changed the FirstName from Gail to Gabriel (this is clearly the very first version of the row, as indicated by its start date of 1900-01-01 00:00:00.0000000):

Temporal12

We made our changes only 5 or 6 seconds apart, but to effectively demonstrate temporal, we’ll need to tweak that to simulate longer periods of time between changes. Now caution here, you normally wouldn’t ever touch these columns, and in fact, as long as temporal is enabled, SQL Server won’t permit you to change them. So for demonstration purposes only, I’ll disable temporal, adjust the period columns, and then re-enable temporal. This must be done very carefully so that the start and end dates of each version continue to match up exactly as we’ve seen:

Temporal13

In this code, we first alter the table to disable temporal. Next we update the history table period columns so that the first change, when the FirstName was Gail, actually occurred 35 days ago, and we adjust the EndDate accordingly. The second change, when the FirstName was Gabriel and the DepartmentName was Engineering, let’s say actually occurred 25 days ago. So we adjust the EndDate accordingly, but also the StartDate so that it aligns with the first update from 35 days ago. And the third change, when the FirstName was Gabriel and the DepartmentName was Support occurred just now, so we leave its EndDate alone and only adjust the StartDate so that it aligns with the second update from 25 days ago. We also tweak the deleted row for EmployeeID number 8 so that it appears this row was deleted 25 days ago. Finally, we re-enable temporal on the Employee table.

When we look at the data now, it really does look like this table has been around for a while. I’m running this demo today on August 14th, but it’s showing changes from as far back as 35 days ago – where it appears that the first change was made on July 10th. You can also see that our updates have maintained the seamless connection of start and end dates across multiple versions of the same row:

Temporal14

Now we have a table with a history of changes over the past 35 days. And the beauty of temporal is that we can instantly query this table as it appeared at any time during that period. All you do is issue an ordinary SELECT statement, and include the special syntax FOR SYSTEM_TIME AS OF.

So I’ll run four queries. The first one is an ordinary SELECT that just queries over the current version of the table. But the other three include FOR SYSTEM_TIME AS OF to query the table as it appeared two minutes ago, thirty days ago, and forty days ago:

Temporal15

Let’s inspect the results.

The first query shows current data only, so we see the latest version of EmployeeID number 5, with FirstName Gabriel, and the Department is Executive. We’re also missing EmployeeID number 8, because it’s been deleted:

Temporal16

The second query from two minutes ago shows the same row for Employee ID number 5, but the Department is Support, and this is because we only just recently changed the Department from Support to Executive. But EmployeeID number 8 is still missing, because it was deleted more than just two minutes ago:

Temporal17

The third query is from thirty days ago, so we get an even earlier version of EmployeeID number 5, when the Department was still Engineering. And we also see the deleted row for EmployeeID number 8 miraculously reappear, because it was deleted 25 days ago, and these query results are from 30 days ago:

Temporal18

And finally, the fourth query from 40 days ago shows the very original version of the table before any updates or deletes. We see the original name Gail for EmployeeID number 5, as well as the original row for EmployeeID number 8 that got deleted later on:

Temporal19

As you can see, time travel with temporal tables is pretty awesome. But it doesn’t end here. Stay tuned for a future post that shows how to use the new stretch database feature to transparently migrate part or all of the temporal history table to the cloud on Azure SQL Database.

Until then, happy coding!

Introducing Temporal Tables in SQL Server 2016 (Part 1)

SQL Server 2016 introduces System Version Tables, which is the formal name for the long awaited temporal data feature. In this blog post (part 1) I’ll explain what temporal is all about, and part 2 dives into greater detail on temporal with demos.

Overview

Temporal means, time-related, and in the case of SQL Server, this means that you get point-in-time access to a table, allowing you to query not only the table’s current data, but data as it appeared in the table at any past point in time. So data that you overwrite with one or more update statements, or data that you blow away with a delete statement, is never really lost. It’s always and immediately available simply by telling your otherwise ordinary query to travel back in time when looking at the table.

The mechanism behind this magic is actually rather simple, and completely seamless. SQL Server automatically creates a history table with the same schema as the table enabled for temporal, records every update and delete into that history table along with timestamps for identifying each version of every update or delete. Then, the query engine integrates with the history table and gives you any desired point-in-time access to the temporal table.

Think of all the great uses for this feature.

  • Time travel
    • Being able to query data as it changes over time yields tremendous business value, where temporal tables make it very easy to perform all sorts of trend analysis against your data.
  • Slowly changing dimensions
    • This feature is also very handy when you’re incrementally building large data warehouses with slowly changing dimensions, because the history table always contains data changes that are timestamped.
  • Auditing
    • Temporal tables also give you an inherent auditing solution, when you need to track what data has changed, and when, although it won’t record who made the change.
  • Accidental data loss
    • You know that heart-dropping moment after an update or delete that you really didn’t mean? Well, rather than panic and scramble to find that backup and restore it, you can much more easily recover from your accident by accessing the lost data from the history table.

Using Temporal

It’s pretty easy to get going with temporal, because there are very few pre-requisites. Any table can become a temporal table as long as It has a primary key (which you have in virtually any table), as well as a pair of datetime2 columns, known as the period columns. Given those minimal requirements, you can turn any table into a system versioned table.

SQL Server creates the history table with a schema to match the main table, except that it does not enforce constraints on the history table. This makes sense if you think about it, because multiple versions of the same row, with the same primary key value, will be written to the history table for every change, and so it’s just not possible to enforce the uniqueness of the primary key in the history table.

After creating the history table, SQL Server automatically populates it and preserves the original version of any row affected by an update statement in the main table, as well as retaining any row that gets deleted from the main table. Now certainly, this is nothing that we couldn’t achieve ourselves by writing triggers to do the same thing. However, the real power of a temporal table comes into play at query time. Simply by including the additional syntax FOR SYSTEM_TIME AS OF with a specific point in time in your SELECT statement, SQL Server automatically executes your query against the table – as it appeared at that point in time.

Once enabled for temporal, you continue treating the table pretty much like an ordinary table. In most cases, you can even ALTER the table’s schema, and SQL Server will automatically reflect the schema change in the history table to keep it in sync. However, there are some types of schema changes that won’t be possible unless you first break the temporal connection between the main table and the history table, make the same schema change to both tables in exactly the same manner, and then re-establish the connection between them. Examples of schema changes that require these extra steps include adding an identity column, or a computed column.

Creating a Temporal Table

Here’s an example of a temporal table.

CREATE TABLE Department (
 DepartmentID    int NOT NULL IDENTITY(1,1) PRIMARY KEY,
  DepartmentName  varchar(50) NOT NULL,
  ManagerID       int NULL,
  ValidFrom       datetime2 GENERATED ALWAYS AS ROW START NOT NULL,
  ValidTo         datetime2 GENERATED ALWAYS AS ROW END   NOT NULL,
  PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)
)
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = DepartmentHistory))

It’s just like any other table, and includes the two period columns ValidFrom and ValidTo, although these columns can be named anything you like; you just need to add GENERATED ALWAYS AS ROW START and ROW END, and then reference the two columns with PERIOD FOR SYSTEM_TIME. That’s it for the table schema; to actually turn on temporal for the table, we add WITH SYSTEM_VERSIONING = ON, and also set the name for the history table that SQL Server should create, and that must include the schema name; although if you leave out the HISTORY_TABLE name, SQL Server will generate one based on the main table’s internal object ID.

And that’s all there is to it. You just continue working with the table as usual, and SQL Server captures data changes to the history table, and also maintains the date and time for each version of every row.

Querying a Temporal Table

And so, as a result, you can query the table as it appeared at any past point in time simply by including the FOR SYSTEM_TIME AS OF clause like you see here in this example, where the Employee table is being queried as it appeared exactly thirty days ago:

DECLARE @ThirtyDaysAgo datetime2 = DATEADD(d, -30, SYSDATETIME())

SELECT *
 FROM Employee
 FOR SYSTEM_TIME AS OF @ThirtyDaysAgo
 ORDER BY EmployeeId

So any rows that have been deleted in the past thirty days, they’ll be returned by this query. Any new rows created in the past thirty days? Those won’t be returned. And any rows older than thirty days that have been modified in the past thirty days are returned as they appeared exactly thirty days ago. And that’s the magic of temporal tables in SQL Server 2016.

If you want to learn more, check out part 2, Working with Temporal Tables in SQL Server 2016.