Adding a User to your Azure Subscription with Resource Group Access


So, you’ve got your Azure subscription in place, and you’re the global administrator. Now you want to let someone else access your subscription, but only a specific resource group within your subscription. In this blog post, I’ll show you how to add a new user to your Azure subscription’s directory, and how to then grant permission for that user to a specific resource group within your Azure subscription that they can manage. The new user won’t be able to see or manage any resources in your subscription outside the resource group that you grant them access for.

Step-by-step procedure

Let’s get started. First, log in to the Azure portal and open your subscription’s directory. To do this, search for directory and choose Azure Active Directory, as follows:

Next, take note of the directory name; this is the domain name for the email address of the users you can create in this directory. It will be based on your username, followed by In my case, with username, the directory name is

Now click on Users:

You will see your username listed. Now click New user:

In the User blade, supply information for the new user. This includes the display name and the username. The username must be in the form of an email address, where the domain name matches the directory name.

Also check Show Password to view the auto-generated password so that you can send it to the new user (the portal will require that they change it the first time they log in).

Here I’m creating a new user for my buddy Andrew Brust:

At this point, I have created a new user for Andrew:

When Andrew logs in for the first time, he will be required to change his password. The login will succeed, but he won’t be able to see anything in the subscription until we grant him access to a specific resource group. Let’s do that next.

Click Resource groups, then select the resource group you want to give the user access to. Here I’m giving Andrew access to the sql-demo-rg resource group:

Next, click Access control (IAM):

We need to add the new user to this resource group. So click Add:

From the Role dropdown, select Owner. Then click on the new user and click Save:

This will make the new user an Owner over the entire resource group so that they can fully manage all the resources inside that group (and they can also create new resources inside the resource group). They will still have no access to any other resources in any other resource groups across your subscription.

You’re done! The new user now has full access to the resource group (and can’t see anything else) on the subscription.

To confirm, go back to the Active Directory blade for the new user and click Azure resources:

Here you can see that Andrew has Owner access to the sql-demo-rg resource group, but no access to anything else in the subscription.


In this blog post, I showed you how to create a new user to your Azure subscription directory, and how to grant Owner permissions for that user to a specific resource group in the subscription. Hope you all find this useful!


Demystifying the Multi-Model Capabilities in Azure Cosmos DB

When someone casually asks me, “Hey, what is Cosmos DB?,” I casually respond, “Well, that’s Microsoft’s globally distributed, massively scalable, horizontally partitioned, low latency, fully indexed, multi-model NoSQL database, of course.” One of two things happen then. I’ll often get a long weird look, after which the other person politely excuses themselves and moves along. But every now and again, I’ll hear “wow, that sounds awesome, tell me more!” If you’re still reading this, then I’m guessing you’re in the latter category.

If you start to elaborate on each of the bullet points in my soundbite response, there’s a lot to discuss before you get to “multi-model NoSQL” at the tail end. Starting with “globally distributed,” Cosmos DB is – first and foremost – a database designed for modern web and mobile applications, which are (typically) global applications in nature. Simply by clicking the mouse on a map in the portal, your Cosmos DB database is instantly replicated anywhere and everywhere Microsoft hosts a data center (there are nearly 50 of them worldwide, to date). This delivers high availability and low latency to users wherever they’re located.

Cosmos DB also delivers virtually unlimited scale, both in terms of storage – via server-side horizontal partitioning, and throughput – by provisioning a prescribed number of request units (RUs) per second. This ensures low latency, and is backed by comprehensive SLAs to yield predictable database performance for your applications. And it’s unique “inverted indexing” scheme enables automatic indexing of every property that you store, with minimal overhead.

Whew. That’s quite a lot to digest before we even start pondering Cosmos DB’s multi-model support, mentioned there at the end of my lengthy description. In fact, it’s very deliberately placed at the end. Because regardless of which data model you choose, it actually makes no difference to Cosmos DB. The capabilities around global distribution, horizontal partitioning, provisioned throughput, and automatic indexing apply – these are durable concepts that transcend whatever data model you choose. So you get to pick and choose among any of the supported data models, without compromising any of the core features of the Cosmos DB engine.

Which segues right into the topic of this blog post. What exactly is “multi-model”? And specifically, what does it mean for a database platform like Cosmos DB to support multiple data models?

It all boils down to how you’d like to treat your data – and this is where the developer comes in. Because, while massive scale is clearly important (if not critical), developers don’t really care about such details as long as it all “just works.” And it’s Cosmos DB’s job to make sure that this all works. When it comes to actually building applications – well, that’s the developer’s job, and this is where the decision of which data model to choose comes into play.

Depending on the type of application being built, it could be more appropriate to use one data model and not another. For example, if the application focuses more on relationships between entities than the entities themselves, then a graph data model may work better than a document model. In other cases, a developer may want to migrate an existing NoSQL application to Cosmos DB; for example, an existing Mongo DB or Cassandra application. In these scenarios, the Cosmos DB data model will be pre-determined; depending on the back-end database dependency of the application being ported, the developer would choose either the Mongo DB-compatible or Cassandra-compatible data model. Such a migration would require minimal (to no) changes to the existing application code. And yet, in other “green field” situations, developers that are very opinionated about how data should be modeled are free to choose whichever data model they prefer.

Each data model has an API for developers to work with in Cosmos DB. Put another way, the developer chooses an API for their database, and that determines the data model that is used. So, let’s break it down:


Document Data Model (SQL & MongoDB APIs)

The first thing to point out is that the SQL API is, essentially, the original DocumentDB programming model from the days when Cosmos DB was called DocumentDB. This is arguably the most robust and capable of all the APIs, because it is the only one that exposes a server-side programming model that lets you build fully transactional stored procedures, triggers, and user-defined functions.

So both the SQL and MongoDB APIs give you a document data model, but the two APIs themselves are radically different. Yes, they are similar from a data modeling perspective; you store complete denormalized entities as a hierarchical key-value document model; pure JSON in the case of the SQL API, or BSON in the case of the MongoDB API (BSON is MongoDB’s special binary-encoded version of JSON that extends JSON with additional data types and multi-language support).

The critical difference between the two APIs is the programming interface itself. The SQL API uses Microsoft’s innovative variant of structured query language (SQL) that is tailored for searching across hierarchical JSON documents. It also supports the server-side programming model (for example, stored procedures), which none of the other APIs do.

In contrast, the MongoDB API actually provides wire-level support, which is a compatibility layer that understands the protocol used by the MongoDB driver for sending packets over the network. MongoDB has a built-in find method used for querying documents (unlike the SQL support found in the SQL API). So the MongoDB API appeals to existing MongoDB developers, because they now enjoy the scale-out, throughput, and global distribution capabilities of Cosmos DB, without deserting the MongoDB ecosystem. Because the MongoDB API in Cosmos DB gives you full compatibility with existing MongoDB application code, and lets you continue working with familiar MongoDB tools.

Key-Value Data Model (Table API)

You can also model your data as a key-value store, using the Table API. This API is actually the evolution of Azure Table Storage – one of the very first NoSQL databases available on Azure. In fact, all existing Azure Table Storage customers will eventually be migrated over to Cosmos DB and the Table API.

With this data model, each entity consists of a key and a value pair, but the value itself is a set of key-value pairs. So this is nothing like a table in a relational database, where each row has the same columns; with the Table API in Cosmo DB, each entity’s value can have a different set of key-value pairs.

The Table API appeals primarily to existing Azure Table Storage customers, because it emulates the Azure Table Storage API. Using this API, existing Azure Table Storage apps can be migrated quickly and easily to Cosmos DB. For a new project though, there would be little reason to ever consider using the Table API, when you consider the fact that the SQL API is far more capable than the Table API.

So then, when would you actually choose to use the Table API? Well, again, the primary use case is to migrate an existing Azure Table Storage account over to Cosmos DB, without having to change any code in your applications. Remember that Microsoft is planning to do this for every customer over a long term migration, but there’s no reason to wait for them to do that, if you don’t want to wait. You can migrate the data yourself now, and then immediately start enjoying the benefits of Cosmos DB as a back-end, and you don’t have to make any changes whatsoever to your existing Azure Table Storage applications. You just change the connection string to point to Cosmos DB, and the application continues to work seamlessly against the Cosmos DB Table API.

Graph Data Model (Gremlin API)

You can also choose the Gremlin API, which gives you a graph database based on the Apache Tinkerpop open source project. Graph databases are becoming increasingly popular in the NoSQL world.

What do you put in a graph? One of two things; either a vertex or an edge. Now don’t let these terms intimidate you. They’re just fancy words for entities and relationships, respectively. So a vertex is an entity, and an edge is a one-way relationship between any two vertices. And that’s it – nothing more and nothing less. These are the building blocks of any graph database. And whether you’re storing a vertex or an edge, you can attach any number of arbitrary properties to it; much like the arbitrary key-value pairs you can define for a row using the Table API, or a flat JSON document using the SQL API.

The Gremlin API provides a succinct graph traversal language that enables you to efficiently query across the many relationships that exist in a graph database. For example, in a social networking application, you could easily find a user, then look for all of that user’s posts where the location is NY, and of those, find all the relationships where some other user has commented on or liked those posts.

Columnar (Cassandra API)

There’s a fourth option for choosing a data model, and that’s columnar, using the Cassandra API. Columnar is yet another way of modeling your data, where – in a departure from the typical way of dealing with schema-free data in the NoSQL world – you actually do define the schema of your data up-front. However, data is still stored physically in a column-oriented fashion, so it’s still OK to have sparse columns, and it has good support for aggregations. Columnar is somewhat similar to the Key-Value data model with the Table API, except that every item in the container is required to adhere to the schema defined for the container. And in that sense, columnar is really most similar to columnstore in SQL Server, except of course that it is implemented using a NoSQL architecture, so it’s distributed and partitioned to massively scale out big data.

Atom Record Sequence (ARS)

The fact of the matter is, these APIs merely project your data as different data models; whereas internally, your data is always stored as ARS – or Atom Record Sequence – a Microsoft creation that defines the persistence layer for key-value pairs. Now you don’t need to know anything about ARS; you don’t even need to know that it’s there. But it is there, under the covers, storing all your data as key-value pairs in a manner that’s agnostic to the data-model you’ve chosen to work with.

Because at the end of the day, it’s all just keys and values – not just the key-value data model, but all these data models. They’re all some form of keys and values. A JSON or BSON document is a collection of keys and values, where values can either be simple values, or they can contain nested key-value pairs. The key-value model is clearly based on keys and values, but so are graph and columnar. The relationships you define in a graph database are expressed as annotations that are, themselves key-value pairs, and certainly all the columns defined for a columnar data model can be viewed as key-value pairs as well.

So, these API’s are here to broaden your choices in terms of how you get to treat your data; they bear no consequence on the ability to scale your database. For example, if you want to be able to write SQL queries, you would choose the SQL API, and not the Table API. But if you want MongoDB or Azure Table Storage compatibility, then you’d go with the MongoDB or Table API respectively.

Switching Between Data Models

As you’ve seen, when you choose an API, you are also choosing a data model. And today (since the release of Cosmos DB in May 2017), you choose an API when you create a Cosmos DB account. This means that today, a Cosmos DB account is tied to one API, which ties it to one data model:


But again, each data model is merely a projection of the same underlying ARS format, and so eventually you will be able to create a single account, and then switch freely between different APIs within the account. So that then, you’ll be able to access one database as graph, key-value, document, or columnar, all at once, if you wish:


You can also expect to see additional APIs in the future, as Cosmos DB broadens its compatibility support for other database systems. This will enable a an even wider range of developers to stick with their database of choice, while leveraging Cosmos DB as a back end for horizontal partitioning, provisioned throughput, global distribution, and automatic indexing.


Azure Cosmos DB has multiple APIs and supports multiple data models. In this blog post, we explored the multi-API, multi-model capabilities of Cosmos DB, including the document data model with either the SQL or MongoDB APIs, key-value with the Table API, graph with the Gremlin API, and columnar with the Cassandra API.

Regardless of which data model you choose, however, Cosmos DB stores everything in ARS, and merely projects different data models, based on the different APIs. This provides developers with a wide range of choices for how they’d like to model their data, without making any compromises in scale, partitioning, throughput, indexing, or global distribution.

Learning Azure Cosmos DB

Hey everyone!

I’m extremely delighted to announce the launch of my brand new Pluralsight course on Azure Cosmos DB.

This is a 10-module course that targets developers – and that’s you!

The course should be live on Pluralsight in the next few days. If you’re interested in building next-generation global apps for the web and mobile, just join me for about 6 hours, and get up to speed in a heartbeat!

Here’s what the course delivers:

Part 1: Introduction and Overview

We begin by defining NoSQL, explaining big data, and describing the characteristics of a NoSQL Database. Then we introduce Cosmos DB with a high-level discussion around global distribution, server-side partitioning, and its multi-model capabilities (with support for documents, tables, and graphs). We also introduce the local emulator, which lets you develop for Cosmos DB without an internet connection or Azure subscription. This module has several demos showing how to create an Azure Cosmos DB account, how to download and install the local emulator, and how to use the SQL API to create a collection, populate it with documents, and query for documents using Cosmos DB SQL.

Part 2: Tuning for Throughput and Performance

This module explains how to provision throughput in Cosmos DB. We begin by learning about request units, or RUs. With RUs, you don’t worry about hardware concerns, like CPU, memory, or I/O. Instead, Cosmos DB delivers predictable performance by letting you reserve as many RUs per second that your application needs, and will throttle requests if you start to exceed to throughput limits that you reserve. The portal has many customizable charts and graphs that let you monitor the request unit consumption of your database, and you can whiteboard the cost to estimate your throughput needs using the online request unit calculator. The module concludes with an explanation of how Cosmos DB pricing is divided between storage and throughput.

Part 3: Horizontal Partitioning

This module discusses how Cosmos DB achieves elastic scale via server-side horizontal partitioning. We explain how containers are logical resources that Cosmos DB manages by creating and replicating multiple physical partitions to scale both storage and throughput. For best results, we discuss the considerations to keep in mind when choosing a partition key, which groups related items together for queries, and supports updating multiple items in a transaction using stored procedures. The module concludes by showing how to enable cross-partition queries, where Cosmos DB automatically fans out the query to multiple partitions, and aggregates each partition’s results into a single result for the query.

Part 4: Globally Distributing Data

In this module, we explore the geo-replication capabilities in Cosmos DB, also called turnkey global distribution. In addition to local replication within one Azure data center, Cosmos DB can replicate your database to any number of other Azure data centers in regions located throughout the world. This brings your data closer to your users, giving them low latency and high availability wherever they are located. You will see how easy it is to enable global distribution, and configure your application with a failover sequence of preferred data. We wrap up the module by discussing consistency, and explaining the five consistency levels that let you balance latency and availability with stale reads that occur when querying replicas that are not up to date.

Part 5: Data Modeling for the SQL API

Using the SQL API, you model items in the database as JSON documents. This module starts by calling out some of the key differences in data modeling between a traditional relational database platform (like SQL Server) and a JSON document database (like Cosmos DB with the SQL API). We then examine the special properties, including the resource ID and self-link properties that uniquely identify each resource, and the URI factory which simplifies the task of constructing the self-link to any resource in the database. The module concludes by showing how the Data Migration Tool can be used to transform and import data from SQL Server to Cosmos DB.

Part 6: Querying Documents

This module explores the special version of SQL used by Cosmos DB for querying JSON documents in a collection
with the SQL API. We explain how the query language is rooted in JSON and JavaScript semantics, and then
dive into numerous demos that show how to filter, sort, iterate arrays, and perform intra-document joins,
using the many available operators and built-in math, type checking, string, array, aggregate, and spatial
functions provided by Cosmos DB SQL.

Part 7: Programming with the .NET SDK

This module shows how to use the .NET SDK with the SQL API to build client applications for Cosmos DB. We cover how to use the SDK for working with databases and collections, creating and querying documents, indexing, and resource tokens based on users and permissions.

Part 8: Programming the Cosmos DB Server

This module teaches you how to write server-side code that runs within the Cosmos DB service, including stored procedures, triggers, and user-defined functions. We also show you how to cope with throttled requests, and how to implement a continuation model for long-running processes.

Part 9: Using the Table API

This module explains the Table API in Cosmos DB, which provides 100% compatibility with the original Azure Table Storage NoSQL service, but adds predictable reserved throughput, and automatic indexing. We discuss the key-value data model used by the Table API, explain when it makes sense to use this API and when it doesn’t, and show how to migrate an existing Azure Table Storage account and application to Cosmos DB and the Table API.

Part 10: Using the Gremlin API

This module explains the Gremlin API, which provides a graph data model over Cosmos DB. This API is based on the Apache TinkerPop open source project, and defines a set of steps that can be used to populate a graph with vertices (entities) and edges (relationships). After populating the graph, you can run Gremlin queries that traverse the many relationships defined in the database. We’ll create several demos, including a simple graph, airport model, and a multi-model comic book catalog.

I hope this has whet your appetite to learn more about Cosmos DB!


Always Encrypted = Never Decrypted (Except on the Client)

Security is, was, and always will be, of paramount concern to software developers. Ideally, we build our applications with multiple layers of security, and typically, encryption plays a key role in our efforts to protect sensitive data (passwords, credit card numbers, etc.) against unauthorized access. With the advent of cloud computing, security concerns become even more pressing, because we need to draw a clear separation between those who own our data (us, the client), from those who manage it (our cloud service provider). This is where Always Encrypted – a new security feature in SQL Server 2016 – comes into play.

Older Encryption Features

Before diving in to Always Encrypted, let’s first establish some context by reviewing existing encryption features that have been available in earlier versions of SQL Server. For one, column level encryption has been available since SQL Server 2005. This type of encryption uses either certificates or symmetric keys such that, without the required certificate or key, encrypted data remains hidden from view.

As of SQL Server 2008 (Enterprise Edition), you can also use Transparent Data Encryption (TDE) to encrypt the entire database – again, using a special certificate and database encryption key – such that without the required certificate and key, the entire database remains encrypted and completely inaccessible. A really nice feature of TDE is that any backup files made of the database are also rendered unusable without the TDE certificate and database encryption key.

From a security standpoint, these older features do serve us well. However, they suffer from two significant drawbacks. First, the very certificates and keys used for encryption are themselves stored in the database (or database server), which means that the database is always capable of decrypting the data (as long as the required certificates and keys are present). While this may have been perfectly fine some years back, it becomes a major problem today if you are interested in moving their data to the cloud. By giving up physical ownership of the database, you are handing over the encryption keys and certificates to your cloud provider, empowering them to access your private data.

Another issue is the fact that these older features only encrypt data “at rest” – meaning, data is only encrypted when it is being written to disk. So even if, for example, SSL is used to encrypt a credit card number between the client browser and the web server, it then gets sent in clear text over the wire from the web server to the database server, where it ultimately gets encrypted once more in database storage. And the same is true in the opposite direction, when reading back the credit card. So, between the web server and the database server, a hacker could sniff out your sensitive data. Again, this was perhaps a lesser concern years ago, but nowadays customers also want to move their applications and virtual networks to the cloud as well, not just their databases. Yet migrating to the cloud also means handing over the sensitive network connection between the web and database servers, that used to be in your control when that infrastructure was hosted in your own data centers.

And so now, in SQL Server 2016 (all editions), we have Always Encrypted, where “always” really means all the time. So data is encrypted not just at rest, but also in flight. Furthermore, the encryption keys themselves – which are essential for both encrypting and decrypting – are not stored in the database. Those keys stay with you, the client. And that’s why, as you’ll see, Always Encrypted is a hybrid feature, where all the encryption and decryption occurs exclusively on the client side.

With this approach, you effectively separate the clients who own the data, from the cloud providers who manage it. Because the data is always encrypted, SQL Server is powerless to decrypt it. All it can do is serve it up in its encrypted state, so that the data is also encrypted in flight. Only when it arrives at the client can it be decrypted, on the client, by the client, who possesses the necessary keys. Likewise, when inserting or updating new data, that data gets encrypted immediately on the client, before it ever leaves the client, and remains encrypted in-flight all the way to the database server, where SQL Server can only store it in that encrypted state – it cannot decrypt it. This is classic GIGO (garbage in, garbage out), as far as SQL Server is concerned.

Enabling Always Encrypted is a bit of a process, and one which is made much easier using the Always Encrypted Wizard in SQL Server Management Studio. As you’ll see, the wizard really streamlines the whole process, and makes it virtually painless to take an existing, unencrypted table, and convert it into a table with always encrypted columns in it.

Encryption Types

Always Encrypted supports two types of encryption: randomized and deterministic.

Randomized encryption is the most secure, because it randomly and unpredictably generates completely different results for the same value. However, this randomness means that there’s no equality support, because the same data always has a different encrypted value. So you’ll never be able to query on a column using randomized encryption, nor would you be able to join tables on this column, or group, or index, this column, or perform any other operation that requires a deterministic equality test.

For that, you’ll use deterministic encryption, which always generates the same encrypted result for the same data, repeatedly and predictably. This gives you the equality support you need to query on it, index on it, and so on, but is less secure because a hacker could start analyzing patterns to figure out how to decrypt your data. The risk increases significantly when working with small sets of values, particularly Boolean (bit) values like True/False, or Male/Female, because the hacker will only see two distinct values in a bit column that uses deterministic encryption. However, you should be aware that range searching will not be possible even with deterministic encryption, because it’s impossible to determine sequence of values, even if the encrypted value itself is deterministic.

Encryption Keys

Always Encrypted uses a key to encrypt one or more columns in a table; and thus, we call this key the column encryption key, or CEK. The CEK is used on the client to encrypt and decrypt values in whichever column – or columns – you apply the CEK to. So if you have multiple columns you want to encrypt, you could either create a single CEK and apply it to all the columns, or you could create one CEK for each column, or you could mix it up, and apply one or more CEKs to one or more columns in any combination you’d like. Remember that the CEK is used on the client – it is never exposed in the database, which is what keeps it undecryptable (if that’s a word) at the database level. However, the database does store encrypted versions of the CEKs.

So the columns get encrypted by the CEKs, and the CEKs themselves get encrypted by the CMK, or column master key. And only the client possesses the CMK, so the database can’t do anything with the encrypted CEKs other than give them to the client, who can then use the CMK to decrypt the CEKs, and then use the decrypted CEKs to encrypt and decrypt data.

Always Encrypted requires that the client-side CMK be stored in a trusted location, and supports several options for this, including Azure Key Vault – which is a secure, cloud-based repository, local certificate stores, and hardware security modules. We’ll be using the local certificate store to contain our CMK in the following walkthrough.

Always Encrypted Workflow

Let’s start with a simple Customer table with a few columns that we’d like to protect with Always Encrypted:


Here you see the table with three columns, and we’d like to encrypt the Name and social security number columns. In SQL Server Management Studio, the Always Encrypted wizard creates a column encryption key (CEK) for us, which will be used to encrypt and decrypt the Name and SSN columns on the client (again, multiple CEKs are supported, but not with the wizard). It also creates a column master key (CMK).


Remember that the CMK must be stored in a trusted location, and the wizard supports either the local certificate store or Azure Key Vault.


Once the CMK is stored away (client-side) the wizard uses it to encrypt the CEK, and then stores the encrypted CEK in the database. To be clear, the database receives only the encrypted CEK, which it can’t do anything with given the fact that it doesn’t also receive the CMK. Instead, it receives the client-side location of the CMK. Thus, the database has all the metadata that the client needs for encryption. The client can therefore connect to the database and discover all the CEKs, as well as the path to where the CMK is stored in either the certificate store or Azure Key Vault:


Next, the wizard begins the encryption process, which is to say that it creates a new temporary table, transfers data into it from the Customer table, and encrypts the data on the fly. Again, the encryption occurs client-side, within SSMS, and when it’s finished, the wizard drops the Customer table, and renames the temporary table to Customer:


At this point, we’re done with the wizard and with SSMS. And this is what we’re left with: a database with encrypted data that is absolutely powerless to decrypt itself. Because you need the CEK to decrypt, but the database only has an encrypted CEK. And you need the CMK to decrypt the CEK, but the database only has a path to the CMK (accessible just to the client), not the CMK itself:


So now it’s time to work with this data from your client application. Say you need to retrieve a customer name by querying on the customer’s social security number. Here’s your query, which references the Name column in the SELECT, and the SSN column in the WHERE clause, and both of these columns are encrypted in the database:


Your application passes this query on to ADO.NET, which would normally just pass it on as-is to SQL Server. But that won’t work now because we need to encrypt the social security number first, on the client, before issuing the query. This is going to involve a bit of a conversation between SQL Server and the client, and this conversation gets initiated by including a special setting in the database connection string – Column Encryption Setting=Enabled. With this setting in place, SQL Server will send the encrypted CEK back to the client, and will also send the path to the CMK back to the client. Then the client retrieves the CMK from the trusted location (the local certificate store, or Azure Key Vault) and uses it to decrypt the CEK.

Now the client has a CEK that it can use for encryption and decryption. So it can translate the query and send it out over the wire looking like this:


That is how the query appears “in-flight”, and anyone hacking the wire will only get this encrypted view of the social security number (called the cipher-text). This can work only if we are using deterministic encryption for the social security number, so the same social security number will always encrypt to the same cipher-text value, making it possible to query on it.


What about the Name column? Well, the encrypted name is being returned here in the query result, but we could be using either randomized or deterministic encryption for this column. In this scenario, we can say that we only query on the social security number, but never on the name, in which case we’d prefer to use randomized encryption for the name, which is less predictable and thus more secure than deterministic encryption.


The cipher-text in the query result, whether it’s been randomly or deterministically encrypted, is what gets sent back to ADO.NET on the client, so that on the return trip back, the name in the result is still encrypted. Only when the result arrives at the client does ADO.NET use the CEK to decrypt it, like so:


Encrypting an Existing Table

Let’s see all this work on a real table. I’ll create the Customer table with a few columns, and then populate the table with a couple of rows.


You can see that nothing is encrypted yet, and all the data is clearly visible:


I’m also able to query on the Name column, which is case-insensitive:



I can also run a case-insensitive query on the SSN column.



And of course, I can also run a range query on the SSN column to get all those customers with SSN numbers that start with 5 or greater, and alphabetically, this includes the string n/a.



So now let’s use the wizard to encrypt the name and SSN column with Always Encrypted. I’ll right-click on the database, and choose Tasks > Encrypt Columns… to fire up the Always Encrypted wizard.


Click Next on the introduction page, and now we can select columns for encryption.


First, we’ll encrypt the Name column, and we’ll choose Randomized encryption. We have no requirements to ever query on the name, so we’ll want to choose the more secure encryption type. We’ll also let the wizard create a new column encryption key for the Name column

We’ll also encrypt the SSN column, but this time we’ll choose Deterministic, because we want to be able to query on a customer’s social security number. So it’s necessary to ensure that a given social security number always encrypts to the same cipher-text.


Now what about those warning symbols that appear under State? When I hover, the wizard explains that the collation for these columns is being changed from the normal, case-insensitive (CI) collation, to a binary collation (BIN2), which is case-sensitive. This impacts sorting and querying against the encrypted column, as you’ll see shortly.


Now I’ll click Next, and we can configure the column master key. As with the column encryption key, we’ll let the wizard create a new CMK, and save it to the local certificate store. When you use the local certificate store, it means that the CMK must be stored in a certificate on the local machine.


We have only two rows in our table, so this operation is going to be very quick. But large tables with many rows may take some time to process, and you might not be ready to run that process at the same time that you are running the wizard. Remember that this process involves copying the data from the unencrypted table to a new, temporary table – encrypting the data on-the-fly – and then swapping out the old table for the new one. You don’t want to interrupt this process while it’s running, so the next wizard page gives you the option of running the process now, or saving a PowerShell script that you can run later, at a time that’s more convenient.


We’ll allow the process to run right now, so I’ll click Next once more to reach the Summary page


Here, a quick review shows that the wizard will create a new column master key (CMK) that will be used to encrypt a new column encryption key (CEK), and that the new CMK will be saved to the local machine certificate store. We can also see that the CEK will be used to encrypt the Name and SSN columns, with the Name column using randomized encryption, and the SSN column using deterministic encryption.

Finally, I’ll click Finish to let it run.

In the Certificates console, you can see that a new Always Encrypted certificate was created automatically. This is, in essence, our CMK. If I drill into the certificate details, you can see that the hashing algorithm used by the CMK is SHA256.


Back in SSMS, let’s see how our table has been altered by the actions of the wizard. I’ll right-click the Customer table, and script it out to a new query window:


The CustomerId and City columns look exactly as we defined them, but there’s some new syntax here on the Name and SSN columns. Their datatypes haven’t changed – they are still varchar strings – but the collation has been changed to binary, and ENCRYPTED WITH specifies the CEKs used to encrypt these columns, the encryption type (randomized or deterministic) the encryption algorithm itself.

Now let’s query the table, and, as you might have expected, we only see cipher-text for the Name and SSN columns, because they are encrypted.



Furthermore, we can’t run any queries against either of the encrypted columns.



Now let’s query the system catalog views to discover how our database is using Always Encrypted.



First, we see a master key reference, indicating that the CMK can be found in a certificate store via a local machine path. We also see a column encryption key reference, and the value of that CEK, which is encrypted by the CMK. Again, the unencrypted CEK – which is needed to encrypt and decrypt – is not contained anywhere in the database.

If we query on sys.columns joined with sys.column_encryption_keys, we can see the new metadata columns indicating that the Name and SSN columns are both protected by CEK_Auto1, and that Name is using randomized encryption, while SSN is using deterministic encryption:



Querying and Storing Encrypted Data

So how can we decrypt this data and read it?
Remember, it’s the client that does this, and the client will decrypt it for us if we just connect to the database using the special setting Column Encryption Setting=Enabled.
In this case, the client is SSMS, so just right-click in the query window and choose Connection > Change Connection. Then, under Additional Connection Parameters, plug in column encryption setting=enabled:


This action, of course, resets our connection, so I’ll first switch from master back to our encrypted database, and then query the Customer table once more. And now – like magic – we see the data fully decrypted:


Here’s what’s going on. When we reconnected with the special column encryption setting, that initiated a conversation between the client (SSMS in this case) and SQL Server. SQL Server has passed the CEK (encrypted by the CMK) to the client, as well as the location of the CMK, which is our local certificate store in this case. SSMS has then gone ahead and retrieved the CMK from the certificate store, and used it to decrypt the CEK. At that point, SQL Server returned encrypted data as before, but this time the client (again, SSMS) used the CEK to decrypt the results.

However, in SSMS we are still unable to query on the encrypted columns and we also cannot insert new data for the encrypted columns.



This is because these actions need to be parameterized by an ADO.NET client. To demonstrate, we’ll execute some ADO.NET client code. In Visual Studio, I’ll run some code where I’ve appended column encryption setting=enabled to our connection string, and then I’ll issue the same simple SELECT statement against the Customer. And we can clearly see the name and SSN columns, which have been decrypted.


What about querying directly on the encrypted columns? Well, this should be possible, but with restrictions.

First, because we’re using randomized encryption for the Name column, you can simply never query on it. This is a limitation that we accept in return for the more secure type of encryption, which produces unpredictable values.



The SSN column uses deterministic encryption, making it possible to query against it, but only as a case-sensitive equality query. So when we query for the number of customers where the SSN is n/a, we get a count of 1 for the 1 customer SSN that matches. However, the lowercase n/a SSN will not get matched if we query for N/A in uppercase.



And we’re also restricted to an equality compare; so greater-than, less-than, and BETWEEN queries are not permitted. This makes sense of course, because even though the encryption is deterministic, which produces consistent and predictable results, those encrypted results do not retain the sequence of their unencrypted values.


Now let’s try the INSERT again. This failed in SSMS, but it should work this time, because we’ve set up a parameterized SqlCommand object. So I’ll prepare the command object with the INSERT statement, and then populate the parameters collection with the values for the new name, SSN, and City – where we expect the client to encrypt the SSN and City columns before sending out the command.


Sure enough, we don’t fail this time, and the new customer row gets added successfully:


Returning back to SSMS, querying the table now shows the newly inserted row, along with the earlier three rows, and the data appears decrypted because we’re still running on the SSMS connection with the column encryption setting:


Limitations and Considerations

Before concluding, it’s worth taking a moment to think about some issues you may run into using Always Encrypted. This is a great new technology, but it does have some limitations

First, there is no support for any of the large object and other specialized data types, which cannot be encrypted. There are also other notable incompatibilities with Always Encrypted to point out. Most of these aren’t too onerous, but notably you can’t use Always Encrypted with the new Temporal or Stretch features in SQL Server 2016.
Entity Framework was certainly not designed to work with Always Encrypted, but it is possible. If you’re building applications with EF6 against Always Encrypted tables, then check out this article here which talks you through the various workarounds needed for these two technologies to get along:

You should also always remember that encryption and decryption always happen on the client, and this translates into additional management to ensure that clients can access the CMK. If you’re using the local machine certificate store, this means the additional administrative burden of deploying certificates to all the clients that need access to encrypted data.

And, there are several other restrictions, for which Aaron Bertrand has done an excellent job articulating in his blog post here, at this link:


This post began by reviewing encryption capabilities available in earlier versions of SQL Server, which store keys and certificates on the server, and which encrypts data only at rest

Then I introduced Always Encrypted, a new SQL Server 2016 feature in which the client – not the server – handles encryption and decryption. This makes it a very attractive feature for those customer’s that remain reticent to move their database to the cloud, because they don’t want to move their encryption keys to the cloud as well.

We discussed the two encryption types, where randomized provides more secure encryption based on unpredictable values. Such unpredictability means that you can never query on the encrypted data, and that’s why we also have deterministic encryption. This produces the same encrypted values every time, making it more predictable and les secure, but this also makes it possible to query on the encrypted data, with some restrictions. You learned how the CMK is used to encrypt each CEK, which in turn is used to encrypt the actual data in your tables.


Using T4 Templates to Generate C# Enums from SQL Server Database Tables

When building applications with C# and SQL Server, it is often necessary to define codes in the database that correspond with enums in the application. However, it can be burdensome to maintain the enums manually, as codes are added, modified, and deleted in the database.

In this blog post, I’ll share a T4 template that I wrote which does this automatically. I looked online first, and did find a few different solutions for this, but none that worked for me as-is. So, I built this generic T4 template to do the job, and you can use it too.

Let’s say you’ve got a Color table and ErrorType table in the database, populated as follows:

Now you’d like enums in your C# application that correspond to these rows. The T4 template will generate them as follows:

Before showing the code that generates this, let’s point out a few things.

First, notice that the Color table has only an ID and a name column, while the ErrorType table has an ID, name, and description columns. Thus, the ErrorType enums include XML comments based on the descriptions in the database, which appear in IntelliSense when referencing the enums in Visual Studio.

Also notice that one enum was generated for the Color table, while two enums were generated for the ErrorType table. This is because we instructed the T4 template to create different enums for different value ranges (0 – 999 for SystemErrorType, and 1000 – 1999 for CustomerErrorType) in the same table.

Finally, the Color enum includes an additional member Undefined with a value of 0 that does not appear in the database, while the other enums don’t include the Undefined member. This is because, in the case of Color, we’d like an enum to represent Undefined, but we don’t want it in the database because we don’t want to allow 0 as a foreign key into the Color table.

This gives you an idea of the flexibility that this T4 template gives you for generating enums from the database. The enums above were generated with the following T4 template:

<#@include file="EnumsDb.ttinclude" #>
  var configFilePath = "app.config";

  var enums = new []
    new EnumEntry
      ("Supported colors", "DemoDatabase", "dbo", "Color", "ColorId", "ColorName", "ColorName")
      { GenerateUndefinedMember = true },

    new EnumEntry
      ("System error types", "DemoDatabase", "dbo", "ErrorType", "ErrorTypeId", "ErrorTypeName", "Description")
      { EnumName = "SystemErrorType", ValueRange = new Tuple<long, long>(0, 999) },

    new EnumEntry
      ("Customer error types", "DemoDatabase", "dbo", "ErrorType", "ErrorTypeId", "ErrorTypeName", "Description")
      { EnumName = "CustomerErrorType", ValueRange = new Tuple<long, long>(1000, 1999) },

  var code = this.GenerateEnums(configFilePath, enums);

  return code;

You can see that there is very little code that you need to write in the template, and that’s because all the real work is contained inside the EnumsDb.ttinclude file (referenced at the top with @include), which contains the actual code to generate the enums, and can be shared throughout your application. A complete listing of the EnumsDb.ttinclude file appears at the end of this blog post.

All you need to do is call GenerateEnums with two parameters. The first is a path your application’s configuration file, which holds the database connection string(s). The second is an array of EnumEntry objects, which drives the creation of enums in C#. Specifically, in this case, there are three EnumEntry objects, which results in the three enums that were generated.

Each enum entry contains the following properties:

  1. Description (e.g., “Supported colors”). This description appears as XML comments for the enum.
  2. Database connection string key (e.g., “DemoDatabase”). It is expected that a connection string named by this key is defined in the application’s configuration file. Since this property exists for each enum entry, it is entirely possible to generate different enums from different databases.
  3. Schema name (e.g., “dbo”). This is the schema that the table is defined in.
  4. Table name (e.g., “Color”). This is the table that contains rows of data for each enum member.
  5. ID column name (e.g., “ColorId”). This is the column that contains the numeric value for each enum member. It must be an integer type, but it can be an integer of any size. The generator automatically maps tinyint to byte, int to int, smallint to short, and bigint to long.
  6. Member column name (e.g., “ColorName”). This is the column that contains the actual name of the enum member.
  7. Description column name. This should be the same as the Member column name (e.g., “ColorName”) if there is no description, or it can reference another column containing the description if one exists. If there is a description, then it is generated as XML comments above each enum member.
  8. Optional:
    1. EnumName. If not specified, the enum is named after the table. Otherwise, you can choose another name if you don’t want the enum named after the table name.
    2. GenerateUndefinedMember. If specified, then the template will generate an Undefined member with a value of 0.
    3. ValueRange. If specified, then enum members are generated only for rows in the database that fall within the specified range of values. The ErrorType table uses this technique to generate two enums from the table; one named SystemErrorType with values 0 – 999, and another named CustomerErrorType with values 1000 – 1999.

This design offers a great deal of flexibility, while constantly ensuring that the enums defined in your C# application are always in sync with the values defined in the database. And of course, you can modify the template as desired to suit any additional needs that are unique to your application.

The full template code is listed below. I hope you enjoy using it as much as I enjoyed writing it. Happy coding! 😊

<#@ template hostspecific="true" language="C#" debug="false" #>
<#@ assembly name="System.Configuration" #>
<#@ assembly name="System.Data" #>
<#@ import namespace="System.Configuration" #>
<#@ import namespace="System.Data" #>
<#@ import namespace="System.Data.SqlClient" #>
<#@ import namespace="System.Text" #>
  private string GenerateEnums(string configFilePath, EnumEntry[] entries)
    if (entries == null)
      return string.Empty;

    var ns = System.Runtime.Remoting.Messaging.CallContext.LogicalGetData("NamespaceHint");
    var sb = new StringBuilder();
    sb.AppendLine($"namespace {ns}");
    foreach(var entry in entries)
        sb.Append(this.GenerateEnumMembers(configFilePath, entry));
      catch (Exception ex)
        sb.AppendLine($"#warning Error generating enums for {entry.EnumDescription}");
        sb.AppendLine($"  // Message: {ex.Message}");
    return sb.ToString();

  private string GenerateHeaderComments()
    var comments = $@"
// ------------------------------------------------------------------------------------------------
// <auto-generated>
//  This code was generated by a C# code generator
//  Generated at {DateTime.Now}
//  Warning: Do not make changes directly to this file; they will get overwritten on the next
//  code generation.
// </auto-generated>
// ------------------------------------------------------------------------------------------------

    return comments;

  private string GenerateEnumMembers(string configFilePath, EnumEntry entry)
    var code = new StringBuilder();
    var connStr = this.GetConnectionString(configFilePath, entry.ConnectionStringKey);
    var enumDataType = default(string);
    using (var conn = new SqlConnection(connStr))
      using (var cmd = conn.CreateCommand())
        cmd.CommandText = @"
          WHERE TABLE_SCHEMA = @SchemaName AND TABLE_NAME = @TableName AND COLUMN_NAME = @IdColumnName

        cmd.Parameters.AddWithValue("@SchemaName", entry.SchemaName);
        cmd.Parameters.AddWithValue("@TableName", entry.TableName);
        cmd.Parameters.AddWithValue("@IdColumnName", entry.IdColumnName);

        var sqlDataType = cmd.ExecuteScalar();
        if(sqlDataType == null)
          throw new Exception($"Could not discover ID column [{entry.IdColumnName}] data type for enum table [{entry.SchemaName}].[{entry.TableName}]");
        enumDataType = this.GetEnumDataType(sqlDataType.ToString());

        var whereClause = string.Empty;
        if (entry.ValueRange != null)
          whereClause = $"WHERE [{entry.IdColumnName}] BETWEEN {entry.ValueRange.Item1} AND {entry.ValueRange.Item2}";

        cmd.CommandText = $@"
            Id = [{entry.IdColumnName}],
            Name = [{entry.NameColumnName}],
            Description = [{entry.DescriptionColumnName}]
          ORDER BY

        var innerCode = new StringBuilder();
        var hasUndefinedMember = false;
        using (var rdr = cmd.ExecuteReader())
          while (rdr.Read())
            if (rdr["Name"].ToString() == "Undefined" || rdr["Id"].ToString() == "0")
              hasUndefinedMember = true;
            if (entry.NameColumnName != entry.DescriptionColumnName)
              innerCode.AppendLine("\t\t/// <summary>");
              innerCode.AppendLine($"\t\t/// {rdr["Description"]}");
              innerCode.AppendLine("\t\t/// </summary>");
            innerCode.AppendLine($"\t\t{rdr["Name"].ToString().Replace("-", "_")} = {rdr["Id"]},");
        if ((entry.GenerateUndefinedMember) && (!hasUndefinedMember))
          var undefined = new StringBuilder();

          undefined.AppendLine("\t\t/// <summary>");
          undefined.AppendLine("\t\t/// Undefined (not mapped in database)");
          undefined.AppendLine("\t\t/// </summary>");
          undefined.AppendLine("\t\tUndefined = 0,");

          innerCode.Insert(0, undefined);

    var final = $@"
  /// <summary>
  /// {entry.EnumDescription}
  /// </summary>
  public enum {entry.EnumName} : {enumDataType}
    // Database information:
    //  {entry.DbInfo}
    //  ConnectionString: {connStr}


    return final;

  private string GetConnectionString(string configFilePath, string key)
    var map = new ExeConfigurationFileMap();
    map.ExeConfigFilename = this.Host.ResolvePath(configFilePath);
    var config = ConfigurationManager.OpenMappedExeConfiguration(map, ConfigurationUserLevel.None);
    var connectionString = config.ConnectionStrings.ConnectionStrings[key].ConnectionString;

    return connectionString;

  private string GetEnumDataType(string sqlDataType)
    var enumDataType = default(string);
    switch (sqlDataType.ToString())
      case "tinyint"  : return "byte";
      case "smallint" : return "short";
      case "int"      : return "int";
      case "bigint"   : return "long";
      default :
        throw new Exception($"SQL data type {sqlDataType} is not valid as an enum ID column");

  private class EnumEntry
    public string EnumDescription { get; }
    public string ConnectionStringKey { get; }
    public string SchemaName { get; }
    public string TableName { get; }
    public string IdColumnName { get; }
    public string NameColumnName { get; }
    public string DescriptionColumnName { get; }

    public string EnumName { get; set; }
    public Tuple<long, long> ValueRange { get; set; }
    public bool GenerateUndefinedMember { get; set; }

    public string DbInfo
        var info = $"ConnectionStringKey: {this.ConnectionStringKey}; Table: [{this.SchemaName}].[{this.TableName}]; IdColumn: [{this.IdColumnName}]; NameColumn: [{this.NameColumnName}]; DescriptionColumn: [{this.DescriptionColumnName}]";
        if (this.ValueRange != null)
          info = $"{info}; Range: {this.ValueRange.Item1} to {this.ValueRange.Item2}";
        return info;

    public EnumEntry(
      string enumDescription,
      string connectionStringKey,
      string schemaName,
      string tableName,
      string idColumnName = null,
      string nameColumnName = null,
      string descriptionColumnName = null
      this.EnumDescription = enumDescription;
      this.ConnectionStringKey = connectionStringKey;
      this.SchemaName = schemaName;
      this.TableName = tableName;
      this.IdColumnName = idColumnName ?? tableName + "Id";
      this.NameColumnName = nameColumnName ?? "Name";
      this.DescriptionColumnName = descriptionColumnName ?? "Description";
      this.EnumName = tableName;


Making the Case for Entity Framework in the Enterprise

Recently, I was met with some friction by the IT department at a client where, they asserted, that a decision had been made years ago to ban Entity Framework. Like many enterprise environments, this client was understandably concerned with the potential pitfalls of embracing Entity Framework. That meant that my job was to convince them otherwise – not to discount their apprehension, but quite the contrary – to demonstrate how EF can be leveraged for its advantages, and avoided for its shortcomings.

Entity Framework (EF) is a broad framework with many optional parts. There are several aspects of EF that provide great benefit, while others are a source of great consternation – particularly from the perspective of the database purist. As the cliché goes, “with great power comes great responsibility,” and so this blog post explores different aspects of EF, and identifies how to best leverage the framework, and not abuse it.

There are two distinct developer experiences with EF.

  • Design time
  • Run time

EF at Design Time

At design time, there are many different ways that developers can choose to use EF. All of them typically involve code generators, which could be any combination of automatically generated C# code (POCO classes, stored procedure wrappers) and T-SQL code (DDL).

The exact type of code generated depends on the methodology employed by the application developer. These are the set of “xxx-first” approaches to database design, which include:

  • Database-first
  • Model-first
  • Code-first

Database-first is the only acceptable methodology for SQL Server purists. It essentially means that the database comes first, and that EF has no influence on the database design. EF does not generate DDL to create the database, and there is no EF “smell” in the design. Instead, the SQL Server expert implements the database by hand, applying all their experience, and using every best practice there is. They create the database with properly normalized tables, data types, foreign key constraints, unique constraints, check constraints, stored procedures, naming conventions, and so on. And, ideally, they build and maintain the database design using SQL Server Data Tools (SSDT) in Visual Studio with source control (TFS, GIT, etc.), rather than by managing scripts in SSMS.

With the other methodologies (model-first and code-first), the application developer focuses on their logical model (either as a visual .edmx file, or as POCO classes in C# code), and lets the EF design time “figure out the database” from the model. They let EF generate the DDL, which in turn creates the database. All SQL Server data types are inferred from their closest .NET equivalent. Then, as the model evolves, EF generates code migrations to rev-up and rev-down the database.

This approach is typically rejected by database purists, who rightfully object to the database being “reverse-engineered” from a model (.edmx file), or from C# POCO classes. With model-first and code-first, all direct control over the database design is lost. Thus, the only way to control the database design is indirectly, by coercing EF’s design-time DDL generator through the use of naming conventions or [attributes]. Essentially, it means that the application developer, who often lacks the experience and skills needed to properly design a database, gets to design the database. This is a terrible way to build and evolve an enterprise database model.

Our client’s application is strictly database-first, and uses SSDT to centrally store the database design, with full version history in TFS. Furthermore, it leverages a unique open-source T4 template code generator that enables database-first with an extensible object model, and zero friction for migrations. I have personally made several contributions to this utility, which is called the EntityFramework Reverse POCO Generator.

This generator produces C# POCO classes that are always in sync with the tables in the database, and it creates these as partial classes, so that the C# developer can easily build extensions that survive code regeneration. It also generates strongly typed stored procedure wrapper methods, which promotes and simplifies the use of stored procedures at run time.

EF at Run Time

While every developer design time experience may vary, the run time experience is always the same.

  1. Connect to the database with a context
  2. Retrieve objects by querying the context
  3. Modify objects in memory, while the context tracks changes
  4. Save modified objects back to the database

Here, database purists have additional concerns, particularly with item #2. Specifically how is the database being queried? It could be either direct SQL or a stored procedure. When EF is used to call a stored procedure, then it’s no different than calling that stored procedure using conventional ADO.NET or Enterprise Library. Whether it’s an EF DbContext object or an old-fashioned SqlCommand object, the query is prewritten and embedded in the stored procedure.

If, on the other hand, the developer uses EF to execute a LINQ query, then the EF runtime dynamically generates direct SQL. This can potentially (and all to often does) result in poor queries being sent to SQL  Server by the application. I stress potentially because, a) the dynamically generated SQL queries have in fact improved greatly over the many versions of EF, and b) just because a stored procedure is being used instead doesn’t necessarily mean better performance; for example, if the stored procedure has a badly written query inside of it.

It’s similar to the design-time concern; an application developer that lacks experience in SQL Server writes a LINQ query in C# that results in generated SQL that performs poorly. All to often, these developers don’t understand deferred execution in EF, and they can repeatedly execute the same query over and over inside a loop (rather than just once) without realizing it. They are also prone to just append one .Include after another in their LINQ query, resulting in a great number of JOINs that returns a lot of duplicate parent data in the resultset.

EF has excellent run time support for stored procedures, and, as mentioned, the T4 template code generator provides the C# developer with ready-to-use methods that wrap every stored procedure. The vast majority of data access against the database should occur by calling stored procedures. Like the database modeling at design time, these stored procedures should be written by hand with careful attention paid to SQL Server best practices. However, there may be some lightweight cases (for example, simple single-row, single-table SELECTs) where you could allow EF to generate the query without a stored procedure, where the generated query is not objectionable.

Furthermore, you can implement a SelectEntity stored procedure that returns an entire entity as XML. This not only means using a stored procedure and not EF to query the database, but that the stored procedure will produce the most lightweight resultset without all the duplicate parent data that would result with traditional JOINs in the stored procedure.

One of the biggest client-side advantages that EF provides is change tracking. After retrieving and deserializing a complete entity via the aforementioned stored procedure, you can attach it to the EF context so that EF can track changes to it, just the same as it would if EF queried the database without the stored procedure. Then, as you manipulate the object in memory, the EF context keeps track of what rows in which tables need to be INSERTed, UPDATEd, or DELETEd. These are complex details that are very difficult to implement by hand, while allowing the context to figure that all out dramatically streamlines your development process.


In conclusion, the philosophy is that best SQL Server practices are followed, both at design time with respect to data modeling, and at run time with respect to querying. You should take a “pick-and-choose” approach to EF, embracing those aspects that provide compelling developer benefits, and rejecting those that run contrary to proper database design and query principles.

Thus, your application should take a database-first approach to data modeling, and use stored procedures exclusively to query the database, ensuring no EF “smell” at either design time or run time.

Working with Temporal Tables in SQL Server 2016 (Part 2)

In my previous post, I introduced the concept of temporal data, and explained at a high level how SQL Server 2016 implements temporal tables. This post dives into the details of exactly how you create and query temporal tables.

Let’s start with an ordinary table, and convert it into a temporal table. So I’ll create the Employee table, and load it up with some data.



To convert this into a temporal table, first I’ll add the two period columns and then I’ll enable temporal and set dbo.EmployeeHistory as the name of the history table.


Note that because we’re converting an existing table, this must be done in two separate ALTER TABLE statements. For a new temporal table, you can create it and enable it with a single CREATE TABLE statement. Also, and because this is an existing table with existing data, it’s necessary to set DEFAULT values that initialize the period columns with beginning-of-time (1900-01-01 00:00:00.0000000) until end-of-time (9999-12-31 23:59:59.9999999) values, which is not be necessary when creating a new table, or altering an existing table with no rows.

Now when we expand to see the Employee table in the Object Explorer, you can see that it is being designated as a System-Versioned table, which is the official name for temporal tables in SQL Server 2016. It also has a special icon showing a clock over the table, and this tells you that you’re looking at a temporal table. And nested directly beneath, is the history table, dbo.EmployeeHistory, which SQL Server has created with an identical schema to match dbo.Employee.


To effectively demonstrate how to query temporal tables, we’ll need to change some data first. You can see the current state of the data in the Employee table just as we inserted it, while the history table is completely empty because we haven’t made any changes yet:



So let’s update and delete a few rows. I’ll update the same row for EmployeeID number 5 three times, with a 5 or 6 second pause in between each one. First I’ll change the employee name to Gabriel, then I’ll change the department name to Support, and then I’ll change the department name once more to Executive. And, I’ll also delete EmployeeID number 8:


Now I’ll query the tables again:


The Employee table shows the current state just as you’d expect, with EmployeeID number 5 showing the result of the latest UPDATE, and EmployeeID number 8 being deleted. But if you examine the history table, you can see the three earlier versions of employee ID number 5, corresponding with each of the three updates we ran against that row. And we also see EmployeeID number 8, which we deleted.

Let’s have a closer look at the period columns. In the main table, you can see that every row except for EmployeeID number five has period column values that span from the beginning of time until the end of time. But EmployeeID number five has a StartDate indicating that this version of the row began on August 14 at about 5:35pm. This tells you that there are earlier versions of this row available in the history table.


And indeed, the earlier versions are visible here in the history table. Specifically, the previous version of this row is the one that ends at the same time that the current version begins, August 14th at 5:35 and 25 seconds, which is when we changed the DepartmentName from Support to Executive. This ending datetime2 value of 25 seconds past the minute matches exactly with the starting datetime2 value in the current row:


The previous version also was also replaced by an even earlier version about five seconds earlier, at 20 seconds past the minute, when we changed the department name from Engineering to Support:


And there’s one earlier version of the row from about 6 seconds earlier, at 14 seconds past the minute, which is when we changed the FirstName from Gail to Gabriel (this is clearly the very first version of the row, as indicated by its start date of 1900-01-01 00:00:00.0000000):


We made our changes only 5 or 6 seconds apart, but to effectively demonstrate temporal, we’ll need to tweak that to simulate longer periods of time between changes. Now caution here, you normally wouldn’t ever touch these columns, and in fact, as long as temporal is enabled, SQL Server won’t permit you to change them. So for demonstration purposes only, I’ll disable temporal, adjust the period columns, and then re-enable temporal. This must be done very carefully so that the start and end dates of each version continue to match up exactly as we’ve seen:


In this code, we first alter the table to disable temporal. Next we update the history table period columns so that the first change, when the FirstName was Gail, actually occurred 35 days ago, and we adjust the EndDate accordingly. The second change, when the FirstName was Gabriel and the DepartmentName was Engineering, let’s say actually occurred 25 days ago. So we adjust the EndDate accordingly, but also the StartDate so that it aligns with the first update from 35 days ago. And the third change, when the FirstName was Gabriel and the DepartmentName was Support occurred just now, so we leave its EndDate alone and only adjust the StartDate so that it aligns with the second update from 25 days ago. We also tweak the deleted row for EmployeeID number 8 so that it appears this row was deleted 25 days ago. Finally, we re-enable temporal on the Employee table.

When we look at the data now, it really does look like this table has been around for a while. I’m running this demo today on August 14th, but it’s showing changes from as far back as 35 days ago – where it appears that the first change was made on July 10th. You can also see that our updates have maintained the seamless connection of start and end dates across multiple versions of the same row:


Now we have a table with a history of changes over the past 35 days. And the beauty of temporal is that we can instantly query this table as it appeared at any time during that period. All you do is issue an ordinary SELECT statement, and include the special syntax FOR SYSTEM_TIME AS OF.

So I’ll run four queries. The first one is an ordinary SELECT that just queries over the current version of the table. But the other three include FOR SYSTEM_TIME AS OF to query the table as it appeared two minutes ago, thirty days ago, and forty days ago:


Let’s inspect the results.

The first query shows current data only, so we see the latest version of EmployeeID number 5, with FirstName Gabriel, and the Department is Executive. We’re also missing EmployeeID number 8, because it’s been deleted:


The second query from two minutes ago shows the same row for Employee ID number 5, but the Department is Support, and this is because we only just recently changed the Department from Support to Executive. But EmployeeID number 8 is still missing, because it was deleted more than just two minutes ago:


The third query is from thirty days ago, so we get an even earlier version of EmployeeID number 5, when the Department was still Engineering. And we also see the deleted row for EmployeeID number 8 miraculously reappear, because it was deleted 25 days ago, and these query results are from 30 days ago:


And finally, the fourth query from 40 days ago shows the very original version of the table before any updates or deletes. We see the original name Gail for EmployeeID number 5, as well as the original row for EmployeeID number 8 that got deleted later on:


As you can see, time travel with temporal tables is pretty awesome. But it doesn’t end here. Stay tuned for a future post that shows how to use the new stretch database feature to transparently migrate part or all of the temporal history table to the cloud on Azure SQL Database.

Until then, happy coding!