Sharing State in SQL Server 2016 with SESSION_CONTEXT

If you’ve ever wanted to share session state across all stored procedures and batches throughout the lifetime of a database connection, you’re going to love SESSION_CONTEXT. When you connect to SQL Server 2016, you get a stateful dictionary, or what’s often referred to as a state bag, some place where you can store values, like strings and numbers, and then retrieve it by a key that you assign. In the case of SESSION_CONTEXT, the key is any string, and the value is a sql_variant, meaning it can accommodate a variety of types.

Once you store something in SESSION_CONTEXT, it stays there until the connection closes. It is not stored in any table in the database, it just lives in memory as long as the connection remains alive. And any and all T-SQL code that’s running inside stored procedures, triggers, functions, or whatever, can share whatever you shove into SESSION_CONTEXT.

The closest thing like this we’ve had until now has been CONTEXT_INFO, which allows you to store and share a single binary value up to 128 bytes long, which is far less flexible than the dictionary you get with SESSION_CONTEXT, which supports multiple values of different data types.

SESSION_CONTEXT is easy to use, just call sp_set_session_context to store the value by a desired key. When you do that, you supply the key and value of course, but you can also set the read_only parameter to true. This is locks the value in session context, so that it can’t be changed for the rest of the lifetime of the connection. So, for example, it’s easy for a client application to call this stored procedure to set some session context values right after it establishes the database connection. If the application sets the read_only parameter when it does this, then the stored procedures and other T-SQL code that then executes on the server can only read the value, they can’t change what was set by the application running on the client.

How do you extract a value out of session context? Well, by using the SESSION_CONTEXT function of course. You supply the key, and the function returns the value. But remember, it returns this as a sql_variant, so you’ll usually need to cast or convert the value into whatever data type you need, like a varchar, int, or date.

Let’s demonstrate with a quick example.

First I’ll create this stored procedure that does something. And to do it, the stored procedure needs to know the region, but it doesn’t get the region using a parameter. Instead, it calls the new SESSION_CONTEXT function, requesting the value keyed as UsRegion, which we cast to a varchar(20) using CONVERT.

CREATE PROCEDURE DoThis AS
BEGIN
	DECLARE @UsRegion varchar(20) = CONVERT(varchar(20), SESSION_CONTEXT(N'UsRegion'))
	SELECT DoThis = @UsRegion
END

And here’s another stored procedure that also takes no parameters, and gets the region from session_context.

CREATE PROCEDURE DoThat AS
BEGIN
	DECLARE @UsRegion varchar(20)
	SET @UsRegion = CONVERT(varchar(20), SESSION_CONTEXT(N'UsRegion'))
	SELECT DoThat = @UsRegion
END

Both these procedures are expecting some earlier code to store the desired region into session context. Until that happens, SESSION_CONTEXT simply returns NULL.

EXEC DoThis
EXEC DoThat

DoThis
--------------------
NULL

DoThat
--------------------
NULL

So Let’s call sp_set_session_context, and set the region to Southwest.

EXEC sp_set_session_context @key = N'UsRegion', @value = N'Southwest'

Now when we call the procedures, they both get Southwest from the session context.

EXEC DoThis
EXEC DoThat

DoThis
--------------------
Southwest

DoThat
--------------------
Southwest

Now before moving on to change the value, let’s first see that the value persists only for the lifetime of the connection. Run it a few more times, and you can see it’s still returning Southwest, but then close the connection and open a new one. Now running the stored procs again, the region is NULL, because session context is empty.

EXEC DoThis
EXEC DoThat

DoThis
--------------------
NULL

DoThat
--------------------
NULL

Now call sp_set_session_context again to change the region to Northeast,

EXEC sp_set_session_context @key = N'UsRegion', @value = N'Northeast'

And the procedures show Northeast, just as expected:

EXEC DoThis
EXEC DoThat

DoThis
--------------------
Northeast

DoThat
--------------------
Northeast

Change it once more to Southeast, only this time, also set the readonly parameter to true. This prevents the value from being changed again during this session:

EXEC sp_set_session_context @key = N'UsRegion', @value = N'Southeast', @read_only = 1

And the change is reflected when we run the procedures again:

EXEC DoThis
EXEC DoThat

DoThis
--------------------
Southeast

DoThat
--------------------
Southeast

Finally, try to change the region to Northwest:

EXEC sp_set_session_context @key = N'UsRegion', @value = N'Northwest'

You see that we can’t. because it’s locked in session context:

Msg 15664, Level 16, State 1, Procedure sp_set_session_context, Line 27
Cannot set key 'UsRegion' in the session context. The key has been set as read_only for this session.

There’s absolutely no way the value can be changed unless you kill the connection like we saw earlier, in which case of course you lose all the values in session context.

Advertisements

Just DIE Please! Introducing Drop If Exists (DIE) in SQL Server 2016

In SQL Server 2016, Drop If Exists (DIE) is a handy new T-SQL language enhancement that eliminates the need to test before you drop.

So, if you need to delete a table, or a stored procedure, but you don’t know if it exists or not, then you’re used to writing code that says, “If the table exists, then drop it,” or “if the stored procedure exists, then drop it.” We’ve all been writing code like this for years, but that doesn’t mean it’s been fun:

• IF OBJECT_ID('dbo.Product', 'U') IS NOT NULL
   DROP TABLE dbo.Product

• IF EXISTS (SELECT * FROM sys.triggers WHERE name = 'trProductInsert')
   DROP TRIGGER trProductInsert

So now, thankfully, we can leave the testing to SQL Server:

• DROP TABLE IF EXISTS dbo.Product

• DROP TRIGGER IF EXISTS trProductInsert

It doesn’t get much simpler than this. It’s really just an ordinary DROP statement; you just inject the new syntax IF EXISTS in the middle, and you’re done.

This new feature is available for just about anything you need to drop, so not just tables and triggers, but all of the following:

  • AGGREGATE
  • ASSEMBLY
  • DATABASE
  • DEFAULT
  • INDEX
  • PROCEDURE
  • ROLE
  • RULE
  • SCHEMA
  • SECURITY POLICY
  • SEQUENCE
  • SYNONYM
  • TABLE
  • TRIGGER
  • TYPE
  • VIEW

So no, DIE isn’t really any major new big feature, but if you like neat code – as I do – then it’s welcome just the same.

Overview of New SQL Server 2016 Developer Features

Every new version of SQL Server is packed with new features, and SQL Server 2016 is no exception. In this blog post, I briefly describe the major new developer focused features introduced in SQL Server 2016. I’ll cover many of these features in greater depth, in upcoming posts.

Drop If Exists is a small but convenient language enhancement which helps you write neater T-SQL code, because you no longer need to test if an object exists before deleting it.

SESSION_CONTEXT gives you a dictionary object that maintains its state across the lifetime of the database connection, so it’s a new easy way to share state across stored procedures running on the server, and even to share state between the client and the server.

• With Dynamic Data Masking, or DDM, you can shield sensitive information in your tables from unauthorized users by masking them, and this works purely with permissions, without ever modifying the actual data in the table.

Row-Level Security, or RLS, lets you hide different rows for different users, based on your own custom criteria. The hidden rows are automatically filtered out of all queries that get issued against the table, and you can also block users from inserting, updating, or deleting rows according to your needs.

Always Encrypted lets you encrypt data that is never decrypted in any location other than the client. By using client-side certificates to perform client-side encryption and decryption, the data is always encrypted – not just on disk, but in-flight, as it traverses the network.

• With Stretch DB, you can keep using your own data centers and SQL Servers to host and manage your data, but still allow tables you designate for remote data archive to be migrated to the cloud on Azure SQL Database. So you keep your hot data on-premises, but let your cold data stretch out to the cloud, completely transparently.

Temporal data is an exciting new feature that automatically tracks changes made to a table, and records those changes to a history table. Then, the query engine integrates with the history table and gives you this almost magical experience of time-travel, where you can run a query against a table as it appeared at any point in time of the past.

JSON support. XML support first appeared back in SQL Server 2000, and then got a major boost in 2005 with the native XML data type. Today, JSON is the new XML, and SQL Server 2016 provides JSON support very similar to what’s possible with XML. You can shred, store, query, and manipulate JSON documents in just about any way you need to, in SQL Server 2016.

Hekaton improvements – QL Server 2014 introduced In-Memory OLTP, which many still call by its code name, “Hekaton.” Hekaton can dramatically boost performance by migrating from traditional disk-based tables to newer memory-optimized tables. The technology is compelling, but the initial release in 2014 carried a lot of limitations, and the most egregious ones have been removed in 2016.

PolyBase – We’re living in a world of big data, where increasingly, massive amounts of information is being stored in large No-SQL stores such as Hadoop and Azure Blob Storage. PolyBase is a new feature in that lets you integrate with both Hadoop and Azure Blob Storage. By defining external tables that map to these No-SQL environments, you can write T-SQL queries that seamlessly retrieves data from them, and can even push portions of the query down to execute on Hadoop as a compute job.

• The new QueryStore feature will cache execution plans, and capture the performance of the same query over time. This is a great tool when you’re trying to troubleshoot the performance of a query that once had a good execution plan, but no longer does because of some environmental change, say some change in table statistics. With query store, you can much more easily identify that change, and make the necessary adjustments to ensure that your SQL Server continues to devise good execution plans.

R Integration – R is an analytic programming language that has grown very popular over recent years, and now SQL Server 2016 introduces R services. This lets you write code in R and run it right inside the database engine itself. This is a huge win for data scientists who will no longer need to first extract their data out of SQL Server before they can analyze it with R; instead, they can bring their R code right to the data, and let it run there.

Stay tuned for upcoming posts for more detailed coverage on these awesome new SQL Server features!

Introducing Azure DocumentDB

On April 8, 2015, Microsoft officially launched Azure DocumentDB, and it certainly can be characterized as a typical NoSQL document database. It is a massively scalable NoSQL document database that works with schema-free JSON documents. Beyond this, however, DocumentDB stands out with some very unique capabilities.

SQL Queries Over Schema-Free JSON

Of course, DocumentDB works with schema-free JSON. But unlike some platforms that require you to define index paths in advance of being able to query on specific properties, DocumentDB automatically indexes every property in a document as soon as the document is added to the database. Simply put, every document is instantly queryable the moment it’s created, and you can search on any property anywhere within the document hierarchy. Furthermore, documents are queryable using SQL, or I should say, using a special flavor of SQL that anyone with SQL experience should immediately find intuitive.

ACID Transactions Updating Multiple Documents

DocumentDB provides a server-side environment inside which you can write JavaScript code to update multiple documents with full transactional processing. This is an easy and powerful way to ensure data consistency across multiple documents, because DocumentDB ensures that all updates made on the server are committed together, or will roll everything back together in the event of an error.

Tunable Performance

There are many ways to tune DocumentDB for the performance needed by your application. For example, throughput can be scaled up or down instantly across three different performance tiers. And although DocumentDB indexes every property on every document, you can take control and fine-tune an indexing policy that reduces storage and processing overhead for specific documents and/or properties that never need to be indexed. And while DocumentDB supports both strong and eventual consistency, it also has two additional options to give you even greater control over the tradeoffs between performance and consistency.

Runs on Azure

Finally, DocumentDB is available as a fully managed, cloud-based, Platform As A Service, running on Azure. There’s just nothing for you to install or manage. No servers, cables, operating systems, or updates to deal with, no replicas to setup – Microsoft does all that work, and keeps the service running. Azure guarantees availability as well as predictable performance based on the service tier that you sign up for. Within literally minutes, you get started working with DocumentDB using just a browser and an Azure subscription.

Stay tuned for upcoming posts, where I’ll dig into all of these exciting capabilities in greater detail.

Relational Databases vs. NoSQL Document Databases

In this post, we’ll take a close look at some of the differences between a traditional relational store and a NoSQL document store.

Rows vs. Documents

To begin with, a document database stores entities as documents – meaning JSON documents, and this is very different to the way relational databases store data as rows in a table.

i1

Columns vs. Properties

While rows in the relational world are made of up columns, documents contain properties.

i2

Schema vs. Schema-Free

In the relational world, every table has a schema that defines the columns and data types that every row in the table must conform to. In contrast, a document database has no defined schema, and every document can be structured differently.

i3

In this example, there are four columns defined for a table, and it would be necessary to alter the table schema if we wanted a fifth column, or if we wanted to change the maximum length of the name column, or if we wanted to allow nulls in date-of-birth, you get the idea. But because document databases as schema-free, they aren’t subject to these constraints. This makes them ideal when you have a rapidly evolving schema, as is usually the case in software development today.

i4

Here, the first document has several properties, one of which is called name. Yet this doesn’t prevent us from adding a second document that uses the property fullName, rather than name, because there are no rules on schema. Similarly, we can add a third document that stores the fullName property an object that has distinct first and last name properties embedded within it. The database is more than happy to accept all these variations. Of course, this is just an example, and it would normally make no sense to mix up property names and shapes like this, just because you can. In this case, we can speculate that – at some point, it was determined that fullName was a better choice than name, and then at some later point, the fullName property was enhanced to distinguish between first and last names. This is a big selling point of document databases, because they let you evolve your schema as your needs dictate, as compared to relational database, where keeping up with an evolving schema is far more disruptive. Of course, when you are supporting varying schemas like this, it falls on you to present a unified view of the data. In this scenario, you could easily devise a query that returns the full name, first, by testing for the presence of a property called name, or a property called fullName if there is no name property, and then by testing if the fullName is an object or not; if it is, then you’d concatenate the embedded first and last name properties, otherwise, you’d just return the fullName property.

Normalized vs. Denormalized

Another significant difference between relational databases and document databases has to do with data normalization. In the relational world, we strive to normalize data as much as possible. This means avoiding data duplication, and maintaining separate tables for each related entity that can be joined together to produce a complete view of the data.

i5

In this example, we have a User table with a row for John, and we have a separate Holdings table with multiple rows for all of John’s stock holding. The two tables are related on User ID, which is 1 for John, so that all individual users are stored in the user table, and each user’s related holdings are stored in the holdings table. Not only must the tables be joined when retrieving data for an application, but the application must know how to take a single modified object with a user and their holdings, and persist changes to the database by updating the user and holding tables separately. This takes significant effort, even if you leverage assistance from an Object-Relational Mapping framework (ORM) like Entity Framework.

In a document database, we typically do the opposite.

i6

Here we see a single document that contains both the user and their holdings. Suddenly, there is no need to join when running a query, nor is there any need to shred the object into different places when saving changes, in fact, no need for an ORM layer at all. This, together with a schema-free model, eliminates a tremendous amount of friction that you normally have with relational database when designing schemas, joining tables, and maintaining an ORM layer on top of your database.

Of course, this isn’t a silver bullet that works well in every scenario. Here we see the one-to-many relationship implemented in a JSON document using an array of embedded objects. This is fine when you expect a reasonable maximum number of child objects, but what about so-called “unbounded” data. For example, imagine a blog post document with an infinite number of related comments. It would not be possible to store a single blog post with all of its comments inside a single document, and this is an example of where you actually might implement a model similar to a relational database, with each blog post in their own document, and each comment in a separate related document that is tied logically on the blog post ID. You can certainly do this, but it will fall on you and your application to store and retrieve blog posts and comments separately, because the document database won’t join them for you. You’re also free to model things however you’d like, so you could create blog post documents that include the first 100 comments, and then related documents that contain the next hundred, and the next hundred, with 100 comments per related document. There’s no one approach to data modeling that works in every scenario, so it’s up to you to experiment with what works best.

Data duplication is another aspect to this. With document databases, it’s not uncommon to duplicate data across multiple documents so that each document has the data it needs without having to locate other documents. Of course, if this is data that frequently changes, then you face another question as to whether it’s better to update multiple documents when a single piece of duplicated data is changed, or to extract the duplicate data out of each document and maintain it in a single shared document. Once again, how you model your data is all up to you. But at the end of the day, it’s important to understand that document databases work best when dealing with rich hierarchical documents that are entirely, or almost entirely, self-contained. Yes, you can model related documents when you need to, but if you find yourself modeling a database that contains many related documents, and/or, your documents have mostly flat structures, then this is a clear sign that a document database may not be the right tool for the job.

Strong Consistency vs. Eventual Consistency

Relational databases also enforce strong consistency on write operations. After updating the balance of a bank account, for example, we must be guaranteed that queries immediately show the updated balance – it’s entirely unacceptable to continue showing the old balance any time after it’s been changed.

One reason that document databases perform as well as they do is because when you write to the database, the changes are propagated to multiple replicas in the background. Then, read requests can be satisfied by any replica, making it possible to satisfy a high volume of client queries just by maintaining enough replicas. However, because not all replicas may be up-to-date at the point in time that a client issues a query, it is possible to receive inconsistent query results. Eventually, of course, all the replicas will be updated, and queries will return consistent results, which is why this behavior is called “eventual consistency.”

Simple vs. Complex

As I had started explaining, NoSQL databases are simple by design, which is the primary reason that they are able to achieve scale and performance that surpasses relational databases. This is also the reason why they do not and cannot replace relational databases that are often better at handling more complex requirements that don’t necessarily need to achieve massive scale.

Scale-Up vs. Scale-Out

And finally, speaking of scale, relational databases simply don’t scale out easily. You can do it, but it’s hard and expensive, whereas scale-out is a fundamental design goal with document databases.

What is a NoSQL Document Database?

In this blog post, we’ll discuss the major concepts around NoSQL document databases. In future posts, I’ll introduced Azure DocumentDB, Microsoft’s newest NoSQL document database, and discuss the major differences between relational databases and document databases.

What is NoSQL?

The best way to start is to clear up some terminology, where the industry has unfortunately adopted a couple of terms that are arguably misleading. SQL means Structured Query Language – meaning it’s just a language; a way of expressing a request to “go find something from someplace, where some condition is true, and give me back the result in the shape that I want it.” And so, SQL per-se doesn’t really define a specific technology. Again, it’s just a language, a dialect, but because SQL is the traditional query language of relational databases, the terms are often equated. So it’s really more helpful to think of a NoSQL database as a “non-relational” database, where – however you go about querying this database – it’s a database that abandons may of the concepts of relational databases. And so we wound up with the term “NoSQL,” where by now many NoSQL databases have emerged, and Azure DocumentDB is the latest NoSQL contender from Microsoft. But unlike most other NoSQL databases, the primary way to query DocumentDB is – oddly enough – by using SQL, or at least, a version of SQL that’s been adapted to the non-relational world of NoSQL databases. I’ll be talking a lot about DocumentDB in upcoming posts.

OK, so NoSQL really means non-relational. Now that’s a really broad definition. Saying it isn’t relational is like saying it’s anything else. And that’s true, which is why in fact there are different types of NoSQL databases. These include key-value stores, such as Azure Table Storage, column based stores like Cassandra, graph databases like Neo4, and document databases like MongoDB and Azure DocumentDB. While there are key differences between these types, all NoSQL database platforms share several common characteristics.

Huge amounts of data

First, they are designed to scale out, not just up. Meaning that while relational databases scale up easily enough, simply by adding more hardware, it’s much more difficult to scale them out horizontally – that is, to spread relational data across multiple partititions – once you hit the ceiling on CPU, disk, and memory, and can no longer scale up. In contrast, NoSQL databases are designed to scale out – infinitely, in fact—making it much easier to achieve internet scale for modern applications.

Schema-free data

Another common characteristic among NoSQL databases is the concept of schema-free data. That is, unlike relational databases, a NoSQL database does not enforce any schema. Every item in the database is free to store information that may or may not be structured the same as other items – even other items of the same type. This means that you can simply introduce new elements in your data as they become pertinent, without requiring any design changes in the database, such as adding or dropping columns, or changing data types. Similarly, you can stop including elements in new data as they start becoming irrelevant, again, without maintaining a schema in the database.

Simplicity Rules

By design, NoSQL databases are simple. They are not nearly as robust as traditional relational database platforms, like SQL Server and Oracle, and there are two reasons for this. For one, NoSQL databases don’t try to provide the complete functionality that is currently available in a relational database. That is, they are specifically designed to be simpler than relational databases, which is how they are able to out-perform relational databases on a large scale. In other cases, NoSQL databases lack features simply because they are much younger than their relational counterparts, which achieved maturity a long time ago. So while you can expect to see improvements in areas of missing functionality as NoSQL databases evolve, these platforms won’t make an attempt to replace full feature set available in relational databases. Despite the negative connotation in the name “NoSQL,” eliminating relational databases in favor of NoSQL is certainly not a stated goal. But at the same time, a scalable, schema-free, and easy-to-use database platform is rather compelling, and gives us more choices for our applications than we had before. Relational databases are definitely here to stay, but they no longer enjoy the monopoly they once had as the back-end platform of choice for new applications, now that a variety of NoSQL alternatives are here.

Document Database

I mentioned that Azure DocumentDB is a NoSQL database, and that it’s a NoSQL document database specifically. This is yet another unfortunate term, because when most people think of a document, they think of a file – like a Word document, spreadsheet, PDF file, and this is definitely not the type of document we mean when we say “document database.” In NoSQL terms, a document is more like an object graph; a complete representation of an entity and its related entities. While object graphs can be serialized and deserialized in any format, NoSQL databases typically leverage JSON, or JavaScript object notation, as the format for storing, projecting, and transporting data. Given the pervasiveness of JSON in today’s world – particularly among web applications – it should really take noone by surprise that NoSQL document databases have generally embraced JSON as their native data format. It’s a simple, lightweight format, yet expressive enough to support many different data modeling scenarios.

Integrating Document BLOB Storage with SQL Server

NoSQL platforms can support highly scalable databases with BLOB attachments (images, documents, and other files), but if you think you need to embrace a NoSQL solution in lieu of SQL Server just because you have a high volume of BLOBs in your database, then think again. Sure, if you have good reasons to go with NoSQL anyway – for example, if you want the flexibility of schema-free tables, and you can accept the compromises of eventual transactional consistency – then NoSQL can fit the bill nicely.

But critical line-of-business applications often can’t afford the relaxed constraints of NoSQL databases, and usually require schemas that are strongly typed, with full transactional integrity; that is, a full-fledged relational database system (RDBMS). However, relational database platforms like SQL Server were originally designed and optimized to work primarily with structured data, not BLOBs. And so historically, it’s never been feasible to store large amounts of BLOB data directly in the database. That is, until FILESTREAM.

With FILESTREAM, Microsoft addresses the dilemma of storing BLOBs within the relational database. My new Pluralsight course, SQL Server 2012-2014 Native File Streaming, explains this innovative feature in detail, and in this blog post, I’ll discuss how FILESTREAM (and its related technologies) can be used to implement a highly-scalable BLOB storage solution that’s fully integrated with a relational SQL Server database. You’ll also find live demos on everything covered by this post in the course.

Introducing FILESTREAM

Although SQL Server was never originally intended to handle BLOBs in large scale, this is no longer true as of FILESTREAM (introduced in SQL Server 2008). Before FILESTREAM, SQL Server was forced to shove BLOBs into the standard database filegroups, which are really optimized for storing structured row data in 8k pages. Because BLOBs don’t fit naturally within this structure, they must be pushed into off-row storage, which bloats the structured filegroups, and ultimately kills performance.

FS1b

FILESTREAM changes all that. First, to be clear, FILESTREAM is not actually a data type. Rather, it’s an attribute that you apply to the varbinary(max) data type, the same data type that you would use to store BLOBs directly inside the row. But by merely appending the FILESTREAM attribute to the varbinary(max) data type, SQL Server takes a radically different approach to physical BLOB storage. Rather than inundating the standard database filegroups with BLOBs, SQL Server stores BLOB content as files in the file system – where they belong; the file system being a native environment optimized for storing and streaming unstructured binary content. At the same time, it establishes and maintains reference pointers between the rows in the standard filegroups and the files in the file system that are tied to varbinary(max) columns in those rows. All this magic occurs behind the scenes, and is totally transparent to any existing code that works with ordinary varbinary(max) columns.

FS2

In this manner, the BLOB data is physically stored separately from structured row data, but it is logically an integral part of the database. So for example, backing up the database includes the BLOB data, with the option of performing a partial backup that excludes the FILESTREAM filegroup when you want to create smaller backups that don’t include BLOB data.

Furthermore, this solution provides full transactional consistency – because FILESTREAM integrates with the NTFS file system, and NTFS is a transactional file system. So when you start a database transaction and insert a row, and that row includes BLOB data stored in a varbinary(max) FILESTREAM column, then SQL Server automatically initiates an NTFS file system transaction over that BLOB data. Then, the fate of the file system transaction hinges on the fate of the database transaction. If and when the database transaction commits, then SQL Server will also commit the NTFS file system transaction; similarly, rolling back the database transaction automatically rolls back the NTFS transaction.

Accessing BLOBs with T-SQL

With FILESTREAM, you can treat BLOBs as ordinary varbinary(max) columns in T-SQL. For example, you can use the OPENROWSET function with the BULK provider to import an external file into a varbinary(max) column, and if that column is decorated with the FILESTREAM attribute, then SQL Server will automatically store a copy of that file as a BLOB in the NTFS file system behind the scenes, rather than force-fitting it into the standard database filegroups.

For example:

INSERT INTO PhotoAlbum(PhotoId, PhotoDescription, Photo)
  VALUES(
    3,
    'Mountains',
    (SELECT BulkColumn FROM OPENROWSET(BULK 'C:\Demo\Ascent.jpg', SINGLE_BLOB) AS x))

To retrieve BLOBs, it’s a simple SELECT:

SELECT * FROM PhotoAlbum

FS3

Using SqlFileStream and the Streaming API

Although FILESTREAM delivers scalable storage by leveraging the NTFS file system behind the scenes, BLOB access needs to scale as well. It’s great that you can enjoy total transparency by just using T-SQL access, but stop for a moment and think about what SQL Server needs to do when retrieving BLOBs with T-SQL. In order to serve up the Photo column in the resultset shown above for the SELECT statement, for example, SQL Server needed to read the entire contents of each BLOB from the NTFS file system that it’s managing internally, and this can easily and suddenly place a great deal of memory pressure on the server.

To address this concern, FILESTREAM exposes the streaming API. When you use this API, SQL Server still manages the file system behind the scenes, only it shifts the burden and memory requirements of actually reading and writing BLOBs in the file system off of itself and onto the client application. This keeps the memory requirements on SQL Server very lean, regardless of how large your BLOBs may be.

The SqlFileStream class is a managed code wrapper around the streaming API, which makes it extremely easy to use from .NET. In C# or VB .NET, you start a database transaction and issue an INSERT statement, but you don’t actually include the BLOB content with the INSERT statement. Instead, SQL Server passes you back the information you need to create a SqlFileStream object. This object inherits from the base System.IO.Stream class, meaning that it supports all the standard read/write methods of standard .NET stream classes, including memory streams, HTTP request/response streams, and local file streams. So it’s easy to then stream your BLOBs in and out, using buffers in memory allocated to your application – not SQL Server. Then, you just commit the database transaction, and SQL Server automatically commits the NTFS file system transaction at the same time.

In my course, I show you SqlFileStream up close, and demonstrate how to program against the streaming API from a thick client application, a client/server (Web) application, and in an n-tier (WCF) scenario as well.

Introducing FileTable

The FILESTREAM story only gets better with FileTable, added in SQL Server 2012. While FILESTREAM revolutionizes BLOB storage in SQL Server, it’s only accessible to developers and administrators. What about ordinary users? They’re certainly not going to write T-SQL or streaming API code to access BLOBs. And there’s also no way for ordinary client applications to access FILESTREAM data.

The solution is FileTable, which combines FILESTREAM with the hierarchyid data type to furnish an “emulated” file system; that is, a file system that users and applications can work with, but which is really a FileTable in a SQL Server database. A FileTable is just a regular table except that it has a fixed schema; specifically, it has these pre-determined columns for the metadata of the emulated file system:

FS4

Every row in a FileTable represents either a file or a folder (depending on the is_directory column), and the hierarchyid value in the path_locator column is what implements the folder structure of the emulated file system. The hierarchyid data type has methods that you can use to query and manipulate the structure; for example, you can programmatically move entire subtrees from one parent to another.

For rows that represent files, the file_stream column holds the actual BLOB, and this is a varbinary(max) FILESTREAM column. So behind the scenes, it is stored in the NTFS file system just like a varbinary(max) FILESTREAM column of an ordinary table (a non-FileTable) would be.

And so, in addition to being able to use T-SQL or the streaming API with a FileTable, the emulated file system that a FileTable represents also gets exposed to users and client applications via a Windows file share. As a result, changes made to the table in the database are reflected in the emulated file system, and conversely, changes made to the emulated file system by users or client applications are reflected automatically in the database, which ultimately pushes down into the physical NTFS file system being used for BLOB storage behind the scenes.

FS5

Summary

This blog post explained FILESTREAM, and its related feature, FileTable. We first saw how FILESTREAM supports scalable BLOB storage using the NTFS file system behind the scenes, and provides transparent T-SQL access using the varbinary(max) data type. We also learned about the streaming API and SqlFileStream, which shifts the burden and memory requirements for streaming BLOBs off of SQL Server and onto client applications, providing scalable BLOB access. And we finally saw how FileTable combines FILESTREAM with the hierarchyid data type to furnish an emulated file system on the front end that users and client applications can interact with, but which in actuality is just a table in the database.

And so, with FILESTREAM, line-of-business applications can embrace scalable BLOB integration without being forced to consider a NoSQL alternative to SQL Server.