It’s a File System… It’s a Database Table… It’s SQL Server Denali FileTable!

SQL Server 2008 introduced FILESTREAM, an innovative feature that integrates the relational database engine with the NTFS file system to provide highly efficient storage and management of BLOBs. Now FileTable, a new feature coming in the next version of SQL Server (code-named “Denali”), builds on FILESTREAM and takes database BLOB management to the next level. In this post, I’ll explain FileTable and show you how it works. I also encourage you to experiment with FileTable yourself by downloading and installing SQL Server Denali CTP3 from http://bit.ly/DenaliCTP3 (but be sure to install it on a virtual machine; beta software can be difficult or impossible to uninstall).

Note: FileTable relies on FILESTREAM, so you need to understand FILESTREAM to fully understand FileTable. If you’re new to FILESTREAM, you can get the necessary background by reading my three earlier blog posts Introducing FILESTREAM, Enabling and Using FILESTREAM, and Using SqlFileStream.

FileTable combines FILESTREAM with hierarchyid (another SQL Server 2008 feature) and the Windows file system API to deliver new and exciting BLOB capabilities in SQL Server. (I just love seeing reusability leveraged like this.) Like the two words joined in its name, one FileTable functions as two distinct things simultaneously:

  1. A FileTable is an Ordinary Table
  2. A FileTable is an Ordinary File System

First and foremost, a FileTable is a regular SQL Server database table in every respect, with one exception: The schema of a FileTable is fixed. The columns of a FileTable and their data types are pre-determined by SQL Server. Specifically, every FileTable contains these columns:

Column Name Data Type Description
stream_id uniqueidentifier ROWGUIDCOL Unique row identifier
file_stream varbinary(max) FILESTREAM BLOB content (NULL if directory)
name nvarchar(255) Name of file or directory
path_locator hierarchyid Location of file or directory within the logical file system
creation_time datetimeoffset(7) Created
last_write_time datetimeoffset(7) Last Modified
last_access_time datetimeoffset(7) Last Accessed
is_directory bit 0=file, 1=directory
is_offline bit Storage attributes
is_hidden bit
is_readonly bit
is_archive bit
is_system bit
is_temporary bit

Every FileTable implements a logical folder structure using the path_locator column. This is a hierarchyid value that denotes the location of each file and folder (row) within the logical file system (table). The hierarchyid data type was introduced in SQL Server 2008 as a binary value that, relative to other hierarchyid values in the same tree structure, identifies a unique node position (reminder to self: blog on hierarchyid!). It is implemented as a system CLR type, which means that it’s a .NET framework class, and has a set of methods you can use (e.g., GetAncestor, GetDescendant, GetReparentedValue, IsDescendantOf, etc.) to traverse and manipulate the hierarchy. Thus, it’s perfect for casting the hierarchical structure of a file system over a relational table, as FileTable does.

The path_locator column is defined as the table’s primary key with a non-clustered index. A separate key value is also stored in the stream_id column with a non-clustered unique index. This is the uniqueidentifier ROWGUIDCOL value required by any table with varbinary(max) FILESTREAM columns, so FileTable is no exception. And unlike path_locator, this unique value will never change once it is assigned to a new FileTable row, even if the row is later “reparented” (i.e., moved to another location in the hierarchy). Thus, you should treat stream_id as each row’s “ID,” even though it isn’t the table’s primary key.

Every row in a FileTable corresponds precisely to either a single file or folder, as determined by the bit value in the is_directory column. The file or folder name is stored in the name column as an nvarchar(255) string. All of the other column names are self-describing, and are used to store typical file system information such as various timestamps and storage attributes. The actual file content (the BLOB itself) is stored in the file_stream column, which is a varbinary(max) data type decorated with the FILESTREAM attribute. This means that the binary content in the file_stream column is stored in the NTFS file system that SQL Server is managing behind the scenes, rather than the structured file groups where all the other table data is stored (standard FILESTREAM behavior in SQL Server 2008).

In addition to these fourteen columns, each FileTable includes the following three computed (read-only) columns:

Column Name Data Type Description
parent_path_locator hierarchyid Parent node derived from path_locator
file_type nvarchar(255) Extension derived from name
cached_file_size bigint BLOB byte length derived from file_stream

The parent_path_locator column returns the result of calling GetAncestor(1) on path_locator to obtain the path_locator to the parent folder of any file or folder in the table. The file_type column returns the extension of the filename parsed from the string value in the name column. And the cached_file_size column returns the number of bytes stored in the file_stream column (these are BLOBs stored in SQL Server’s internally-managed NTFS file system behind the scenes).

With this fixed schema in place, every FileTable has what it needs to represent a logical file system. Thus, SQL Server is able to fabricate a Windows file share over any FileTable. This magically exposes the FileTable to any user or application who can then view and update the table using standard file I/O semantics (e.g., drag-and-drop with Windows Explorer, or read/write with System.IO.FileStream). So:

  • Creating a file or directory in the logical file system adds a row to the table
  • Adding a row to the table creates a file or directory in the logical file system

Here’s the total FileTable picture:

Take a moment to digest what’s happening here. In addition to the programmatic FILESTREAM access using either T-SQL or SqlFileStream, Denali now offers a third interface to FILESTREAM: A logical file system. In a sense, this fills the SQL Server 2008 FILESTREAM gap in which the file system itself is completely inaccessible. This is not to say that FileTable lets you directly access SQL Server’s internally managed NTFS file system; certainly not. That remains obfuscated and private, as it continues to function in standard FILESTREAM fashion against BLOB data that just happens to be in a FileTable instead of a regular table. What we do get is an abstraction layer over the FileTable that functions as a standard file system. In this logical file system, everything about each file and folder—except the BLOB content of the files themselves—is stored in the FileTable’s structured file group, while the BLOBs themselves are physically stored in the NTFS file system.

So you can see that there’s really nothing new beneath the FileTable layer; as before, SQL Server synchronizes transactional access between the row in the FileTable and its corresponding BLOB content in the NTFS file system to ensure that integrity is properly maintained between them. As with T-SQL access, this synchronization occurs implicitly when manipulating the FileTable via the exposed Windows file share. And, being an ordinary table in virtually every respect, you also have the option to use SqlFileStream with explicit transaction synchronization for the fastest possible streaming of BLOBs into and out of a FileTable.

I don’t know about you, but I find all of this extremely appealing. We now have total flexibility for BLOB storage in the database. With FileTable, you can easily migrate existing applications that work against physical file systems without writing any custom T-SQL or fancy SqlFileStream code. Just use a FileTable, let the existing applications continue working without modification, and enjoy the benefits of your files becoming an integral part of the SQL Server database.

That’s the whole FileTable story. The rest of this post just walks you through the steps and syntax for getting a FileTable up and running. It’s quite simple and straightforward. Specifically, you’ll need to:

  • Enable FILESTREAM for file system access
  • Create a FileTable-enabled database
  • Create a FileTable

Enable FILESTREAM for File System Access

FILESTREAM must be enabled for file system access at both the service level (either during setup, or later via the SQL Server Configuration Manager) and at the instance level. Complete details can be found in my Enabling and Using FILESTREAM post. Once enabled, SQL Server exposes a file share for all the FileTable-enabled databases you create under the SQL Server instance (by default, the share name is MSSQLSERVER).

The following statement enables FILESTREAM at the instance level for file system access (level 2).

EXEC sp_configure filestream_access_level, 2
RECONFIGURE

Create a FileTable-Enabled Database

Naturally, a FileTable-enabled database must include the FILEGROUP…CONTAINS FILESTREAM clause expected of any FILESTREAM-enabled database. In addition, you must also specify two parameters in the SET FILESTREAM clause of the the CREATE DATABASE (or ALTER DATABASE) statement. The DIRECTORY_NAME specifies the name of the folder that will be created for this database in the root file share associated with the instance. And enabling full non-transacted access with NON_TRANSACTED_ACCESS=FULL exposes every FileTable within the database as a subfolder beneath the database folder of the instance’s file share.

CREATE DATABASE PhotoLibrary
 ON PRIMARY
  (NAME = PhotoLibrary_data,
   FILENAME = 'C:\DB\PhotoLibrary_data.mdf'),
 FILEGROUP FileStreamGroup CONTAINS FILESTREAM
  (NAME = PhotoLibrary_blobs,
   FILENAME = 'C:\DB\Photos')
 SET FILESTREAM(
  (DIRECTORY_NAME='PhotoLibrary',
   NON_TRANSACTED_ACCESS=FULL)
 LOG ON
  (NAME = PhotoLibrary_log,
   FILENAME = 'C:\DB\PhotoLibrary_log.ldf')

Create a FileTable

Not surprisingly, this is the easiest part. Since SQL Server controls the schema of every FileTable, you just use a CREATE TABLE statement with the new AS FileTable clause and don’t include any columns:

CREATE TABLE PhotoFiles AS FileTable

Your FileTable is ready to use. You will find a root PhotoFiles folder for the FileTable beneath the PhotoLibrary folder created for the database in the Windows file share for the instance. You can interact with the FileTable using T-SQL, SqlFileStream, or the logical file system exposed by the Windows file share.

Summary

FILESTREAM + hierarchyid + Windows File Share = FileTable

Genius! Enjoy…

Advertisements

Using SqlFileStream in C# to Access SQL Server FILESTREAM Data

FILESTREAM is a powerful feature in SQL Server that stores varbinary(max) column data (BLOBs) in the file system (where BLOBs belong) rather than in the database’s structured file groups (where BLOBs kill performance). This feature was first introduced in SQL Server 2008, and is now being expanded with the new FileTable feature coming in SQL Server 2012 (code-named “Denali”), which I’ll cover in a near-future post (as well as in my Programming Microsoft SQL Server 2012 book, which I’m writing now).

If you’re not already familiar with FILESTREAM, you can get the necessary background by reading my two earlier blog posts: Introducing FILESTREAM and Enabling and Using FILESTREAM. In this post, I’ll show you how to use the SqlFileStream class to achieve high-performance streaming of SQL Server FILESTREAM data in your C# applications. This is a simplified version of the OpenSqlFilestream code I presented in my Using the OpenSqlFilestream API blog post.

What Is SqlFileStream?

SqlFileStream is a class in the .NET Framework (.NET 3.5 SP1 and higher) that wraps the OpenSqlFilestream function exposed by the SQL Server Native Client API. This lets you stream BLOBs directly between SQL Server and your .NET application written in C#. SqlFileStream is always used within a transaction, just at the point that you want to store and retrieve BLOBs to and from varbinary(max) FILESTREAM columns. At that point in time, SQL Server will “step aside” and let you stream directly against the server’s NTFS file system—a native environment optimized for streaming. This provides you with a streaming “tunnel” between your application and SQL Server’s internally-managed file system. Using SqlFileStream will give your application lightning-fast BLOB performance. Let’s dive in!

Creating the Database

Before getting started, be sure that FILESTREAM is enabled for remote file system access at both the Windows Service and SQL Server instance levels (as explained in Enabling and Using FILESTREAM). Then create a FILESTREAM-enabled database as follows (be sure to create the directory, C:\DB in this example, before creating the database):

CREATE DATABASE PhotoLibrary
 ON PRIMARY
  (NAME = PhotoLibrary_data,
   FILENAME = 'C:\DB\PhotoLibrary_data.mdf'),
 FILEGROUP FileStreamGroup CONTAINS FILESTREAM
  (NAME = PhotoLibrary_blobs,
   FILENAME = 'C:\DB\Photos')
 LOG ON
  (NAME = PhotoLibrary_log,
   FILENAME = 'C:\DB\PhotoLibrary_log.ldf')

Next, use the database and create a table for BLOB storage as follows:

USE PhotoLibrary
GO

CREATE TABLE PhotoAlbum(
 PhotoId int PRIMARY KEY,
 RowId uniqueidentifier ROWGUIDCOL NOT NULL UNIQUE DEFAULT NEWID(),
 Description varchar(max),
 Photo varbinary(max) FILESTREAM DEFAULT(0x))

In this table, the Photo column is declared as varbinary(max) FILESTREAM, and will hold pictures that will be stored in the file system behind the scenes. (Refer to Enabling and Using FILESTREAM for a complete explanation of the varbinary(max) FILESTREAM and ROWGUIDCOL columns.) Notice the default value we’ve established for the BLOB column. The value 0x represents a zero-length binary stream, which is different than NULL. Think of it as the difference between a zero-length string and a null string in .NET; the two are not the same. Similarly, you won’t be able to use SqlFileStream against NULL instances of varbinary(max) FILESTREAM columns, and you’ll soon see why.

Writing SqlFileStream Code

Start Visual Studio, create a new Class Library project, and add a PhotoData class as follows:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.Drawing;
using System.IO;
using System.Transactions;

namespace PhotoLibraryApp
{
  public class PhotoData
  {
    private const string ConnStr =
      "Data Source=.;Integrated Security=True;Initial Catalog=PhotoLibrary;";

    public static void InsertPhoto
      (int photoId, string desc, string filename)
    {
      const string InsertTSql = @"
        INSERT INTO PhotoAlbum(PhotoId, Description)
          VALUES(@PhotoId, @Description);
        SELECT Photo.PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT()
          FROM PhotoAlbum
          WHERE PhotoId = @PhotoId";

      string serverPath;
      byte[] serverTxn;

      using (TransactionScope ts = new TransactionScope())
      {
        using (SqlConnection conn = new SqlConnection(ConnStr))
        {
          conn.Open();

          using (SqlCommand cmd = new SqlCommand(InsertTSql, conn))
          {
            cmd.Parameters.Add("@PhotoId", SqlDbType.Int).Value = photoId;
            cmd.Parameters.Add("@Description", SqlDbType.VarChar).Value = desc;
            using (SqlDataReader rdr = cmd.ExecuteReader())
            {
              rdr.Read();
              serverPath = rdr.GetSqlString(0).Value;
              serverTxn = rdr.GetSqlBinary(1).Value;
              rdr.Close();
            }
          }
          SavePhotoFile(filename, serverPath, serverTxn);
        }
        ts.Complete();
      }
    }

    private static void SavePhotoFile
      (string clientPath, string serverPath, byte[] serverTxn)
    {
      const int BlockSize = 1024 * 512;

      using (FileStream source =
        new FileStream(clientPath, FileMode.Open, FileAccess.Read))
      {
        using (SqlFileStream dest =
          new SqlFileStream(serverPath, serverTxn, FileAccess.Write))
        {
          byte[] buffer = new byte[BlockSize];
          int bytesRead;
          while ((bytesRead = source.Read(buffer, 0, buffer.Length)) > 0)
          {
            dest.Write(buffer, 0, bytesRead);
            dest.Flush();
          }
          dest.Close();
        }
        source.Close();
      }
    }

    public static Image SelectPhoto(int photoId, out string desc)
    {
      const string SelectTSql = @"
        SELECT
            Description,
            Photo.PathName(),
            GET_FILESTREAM_TRANSACTION_CONTEXT()
          FROM PhotoAlbum
          WHERE PhotoId = @PhotoId";

      Image photo;
      string serverPath;
      byte[] serverTxn;

      using (TransactionScope ts = new TransactionScope())
      {
        using (SqlConnection conn = new SqlConnection(ConnStr))
        {
          conn.Open();

          using (SqlCommand cmd = new SqlCommand(SelectTSql, conn))
          {
            cmd.Parameters.Add("@PhotoId", SqlDbType.Int).Value = photoId;

            using (SqlDataReader rdr = cmd.ExecuteReader())
            {
              rdr.Read();
              desc = rdr.GetSqlString(0).Value;
              serverPath = rdr.GetSqlString(1).Value;
              serverTxn = rdr.GetSqlBinary(2).Value;
              rdr.Close();
            }
          }
          photo = LoadPhotoImage(serverPath, serverTxn);
        }

        ts.Complete();
      }

      return photo;
    }

    private static Image LoadPhotoImage(string filePath, byte[] txnToken)
    {
      Image photo;

      using (SqlFileStream sfs =
        new SqlFileStream(filePath, txnToken, FileAccess.Read))
      {
        photo = Image.FromStream(sfs);
        sfs.Close();
      }

      return photo;
    }

  }
}

Let’s explain the code in detail. We’ll start at the top with some required namespace inclusions. The two using statements to take notice of are System.Data.SqlTypes and System.Transactions. The System.Data.SqlTypes namespace defines the SqlFileStream class that we’ll be using to stream BLOBs. No special assembly reference is required to use this class, because it is provided by the System.Data.dll assembly that our project is already referencing (Visual Studio set this reference automatically when it created our project). The System.Transactions namespace defines the TransactionScope class that lets us code implicit transactions against the database. This class is provided by the System.Transactions.dll assembly, which is not referenced automatically. You’ll need to add a reference to it now, or the code will not compile. Right-click the project in Solution Explorer and choose Add Reference. In the Add Reference dialog, click the .NET tab, and scroll to find the System.Transactionscomponent. Then double-click it to add the reference.

At the top of the class, we define a connection string as a hard-coded constant named ConnStr. This is just for demonstration purposes; a real-world application would store the connection string elsewhere (such as in a configuration file, possibly encrypted), but we’re keeping our example simple.

Streaming Into SQL Server

The first method defined in the class is InsertPhoto, which accepts a new photo integer ID, string description, and full path to an image file to be saved to the database. Notice that the InsertTSql string constant defined at the top of the method specifies an INSERT statement that includes the PhotoId and Description columns, but not the actual Photo BLOB column itself. Instead, the INSERT statement is followed immediately by a SELECT statement that retrieves two pieces of information we’ll use to stream the BLOB into the Photo column much more efficiently than using ordinary T-SQL—namely, a logical UNC path name to the file and the transactional context token. These are the two values needed to use SqlFileStream, and you’re about to see how exactly. But all we’ve done so far is define a constant holding two T-SQL statements. The constant is followed by two variables declarations serverPath and serverTxn that will receive the two special values when we later execute those T-SQL statements.

The method then creates and enters a new TransactionScope block. This does not actually begin the database transaction (we’ve not even connected to the database yet), but rather declares that all data access within the block (and in any code called from within the block) must participate in a database transaction. Inside the TransactionScope block, the code creates and opens a new SqlConnection. Being the first data access code inside the TransactionScope block, this also implicitly begins the database transaction. Next, it creates a SqlCommand object associated with the open connection and prepares its command text to contain our T-SQL statements (the INSERT followed by the SELECT).

Invoking the ExecuteReader method executes the T-SQL statements and returns a reader from which we can retrieve the values returned by the SELECT statement. The transaction is still pending at this time. Our INSERT statement does not provide a value for RowId and instead allows SQL Server to automatically generate and assign a new uniqueidentifier ROWGUID value by default. I’ve also pointed out that no value is provided for the Photo column—and this is exactly how the default 0x value that we defined earlier for the Photo column comes into play (I said we’d come back to it, and here we are).

Although the row has been added by the INSERT statement, it will rollback (disappear) if a problem occurs before the transaction is committed. Because we didn’t provide a BLOB value for the Photo column in the new row, SQL Server honors the default value 0x that we established for it in the CREATE TABLE statement for PhotoAlbum. This represents a zero-length binary stream, which is completely different than NULL. Being a varbinary(max) column decorated with the FILESTREAM attribute, an empty file gets created in the file system that SQL Server associates with the new row. At the same time, SQL Server initiates an NTFS file system transaction over this new empty file and synchronizes it with the database transaction. So just like the new row, the new file will disappear if the database transaction does not commit successfully.

Immediately following the INSERT statement, the SELECT statement returns Photo.PathName and GET_FILESTREAM_TRANSACTION_CONTEXT. What we’re essentially doing with the WHERE clause in this SELECT statement is reading back the same row we have just added (but not yet committed) to the PhotoAlbum table in order to reference the BLOB stored in the new file that was just created (also not yet committed) in the file system.

The value returned by Photo.PathName is a fabricated path to the BLOB for the selected PhotoId. The path is expressed in UNC format, and points to the network share name established for the server instance when you first enabled FILESTREAM (by default, this is MSSQLSERVER). It is not a path the file’s physical location on the server, but rather contains information SQL Server can use to derive the file’s physical location. For example, you’ll notice that it always contains the GUID value in the uniqueidentifier ROWGUIDCOL column of the BLOB’s corresponding row. We retrieve the path value from the reader’s first column and store it in the serverPath string variable.

We just explained how SQL Server initiated an NTFS file system transaction over the FILESTREAM data in the new row’s Photo column when we started our database transaction. The GET_FILESTREAM_TRANSACTION_CONTEXT function returns a handle to that NTFS transaction (if you’re not inside a transaction, this function will return NULL and your code won’t work). We obtain the transaction context, which is returned by the reader’s second column as a SqlBinary value, and store it in the byte array named serverTxn.

Armed with the BLOB path reference in serverPath and the transaction context in serverTxn, we have what we need to create a SqlFileStream object and perform direct file access to stream our image into the Photo column. We close the reader, terminate its using block, then terminate the enclosing using block for the SqlConnection as well. This would normally close the database connection implicitly, but that gets deferred in this case because the code is still nested inside the outer using block for the TransactionScope object. So the connection is still open at this time, and the transaction is still pending. It is precisely at this point that we call the SavePhotoFile method to stream the specified image file into the Photo column of the newly inserted PhotoAlbum row, overwriting the empty file just created by default. When control returns from SavePhotoFile, the TransactionScope object’s Complete method is invoked and its using block is terminated, signaling the transaction management API that everything worked as expected. This implicitly commits the database transaction (which in turn commits the NTFS file system transaction) and closes the database connection.

The SavePhotoFile method reads from the source file and writes to the database FILESTREAM storage in 512 KB chunks at a time using ordinary .NET streaming techniques. The method begins by defining a BlockSize integer constant that is set to a reasonable value of 512 KB. Picture files larger than this will be streamed to the server in 512 KB blocks at a time. The local source image file (in clientPath) is then opened on an ordinary read-only FileStream object.

Then the destination file is opened by passing the two special values (serverPath and serverTxn), along with a FileAccess.Write enumeration requesting write access, into the SqlFileStream constructor. Like the source FileStream object, SqlFileStream inherits from System.IO.Stream, so it can be treated just like any ordinary stream. Thus, you attain write access to the destination BLOB on the database server’s NTFS file system. Remember that this output file is enlisted in an NTFS transaction and nothing you stream to it will be permanently saved until the database transaction is committed by the terminating TransactionScope block, after SavePhotoFile completes. The rest of the SavePhotoFile method implements a simple loop that reads from the source FileStream and writes to the destination SqlFileStream, one 512 KB block at a time until the entire source file is processed, and then it closes both streams.

Streaming Out From SQL Server

The rest of the code contains methods to retrieve existing photos and stream their content from the file system into an Image object for display. You’ll find that this code follows the same pattern as the last, only now we’re performing read access.

The SelectPhoto method accepts a photo ID and returns the string description from the database in an output parameter. The actual BLOB itself is returned as the method’s return value in a System.Drawing.Image object. We populate the Image object with the BLOB by streaming into it from the database server’s NTFS file system using SqlFileStream. Once again, we start things off by entering a TransactionScope block and opening a connection. We then execute a simple SELECT statement that queries the PhotoAlbum table for the record specified by the photo ID and returns the description and full path to the image BLOB, as well as the FILESTREAM transactional context token. And once again we use the path name and transactional context with SqlFileStream to tie into the server’s file system in the LoadPhotoImage method.

Just as when we were inserting new photos (only this time using FileAccess.Read instead of FileAccess.ReadWrite), we create a new SqlFileStream object from the logical path name and transaction context. We then pull the BLOB content directly from the NTFS file system on the server into a new System.Drawing.Image object using the static Image.FromStream method against the SqlFileStream object. The populated image can then be passed back up to a Windows Forms application, where it can be displayed using the Image property of a PictureBox control.

Or, to stream a photo over HTTP from a simple ASP.NET service:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.IO;
using System.Transactions;
using System.Web.UI;

namespace PhotoLibraryHttpService
{
  public partial class PhotoService : Page
  {
    private const string ConnStr =
      "Data Source=.;Integrated Security=True;Initial Catalog=PhotoLibrary";

    protected void Page_Load(object sender, EventArgs e)
    {
      int photoId = Convert.ToInt32(Request.QueryString["photoId"]);
      if (photoId == 0)
      {
        return;
      }

      const string SelectTSql = @"
        SELECT Photo.PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT()
         FROM PhotoAlbum
         WHERE PhotoId = @PhotoId";

      using (TransactionScope ts = new TransactionScope())
      {
        using (SqlConnection conn = new SqlConnection(ConnStr))
        {
          conn.Open();

          string serverPath;
          byte[] serverTxn;

          using (SqlCommand cmd = new SqlCommand(SelectTSql, conn))
          {
            cmd.Parameters.Add("@PhotoId", SqlDbType.Int).Value = photoId;

            using (SqlDataReader rdr = cmd.ExecuteReader())
            {
              rdr.Read();
              serverPath = rdr.GetSqlString(0).Value;
              serverTxn = rdr.GetSqlBinary(1).Value;
              rdr.Close();
            }
          }

          this.StreamPhotoImage(serverPath, serverTxn);
        }
        ts.Complete();
      }
    }

    private void StreamPhotoImage(string serverPath, byte[] serverTxn)
    {
      const int BlockSize = 1024 * 512;
      const string JpegContentType = "image/jpeg";

      using (SqlFileStream sfs =
        new SqlFileStream(serverPath, serverTxn, FileAccess.Read))
      {
        byte[] buffer = new byte[BlockSize];
        int bytesRead;
        Response.BufferOutput = false;
        Response.ContentType = JpegContentType;
        while ((bytesRead = sfs.Read(buffer, 0, buffer.Length)) > 0)
        {
          Response.OutputStream.Write(buffer, 0, bytesRead);
          Response.Flush();
        }
        sfs.Close();
      }
    }
  }
}

Conclusion

The OpenSqlFilestream function provides native file streaming capabilities between FILESTREAM storage in the file system managed by SQL Server and any native-code (e.g., C++) application. SqlFileStream provide a managed code wrapper around OpenSqlFilestream that simplifies direct FILESTREAM access from .NET applications (e.g., C# and VB .NET). This post explained how this API works in detail, and showed the complete data access code that uses SqlFileStream for both reading and writing BLOBs to and from SQL Server. That’s everything you need to know to get the most out of FILESTREAM. I hope you enjoyed it!

SQL Server 2008 FILESTREAM Part 3 of 3: Using the OpenSqlFilestream API

This is the final installment in a 3-post series covering the new FILESTREAM feature in SQL Server 2008. In part 1, I explained how FILESTREAM works at a high level, and part 2 showed you the step-by-step procedure for enabling and using this awesome new feature. In this post, I’ll show you how to use the OpenSqlFilestream function exposed by the SQL Server Native Client API to achieve the maximum FILESTREAM performance possible in your .NET applications.

IMPORTANT UPDATE: I’ve just blogged on SqlFileStream, a managed code wrapper around OpenSqlFilestream that offers a simpler implementation of the mechanism described in this post. Read it here: https://lennilobel.wordpress.com/2011/08/22/using-sqlfilestream-in-c-to-access-sql-server-filestream-data/

What Is OpenSqlFilestream?

OpenSqlFilestream is a function provided by the SQL Server Native Client API, sqlncli10.dll, and gets installed on your development machine when you install the SQL Server 2008 client tools. This function can be called at the point in time that you want to store and retrieve BLOBs from varbinary(max) FILESTREAM columns, where SQL Server will “step aside,” and allow you to call OpenSqlFilestream to obtain a file handle. With the file handle returned by OpenSqlFilestream, you can stream directly against the file system—a native environment optimized for streaming. Contrast such direct file system access with storing and retrieving BLOBs against varbinary(max) FILESTREAM columns the “old-fashioned” way (either by embedding/extracting byte arrays, or in-lining binary streams as hexadecimal values in T-SQL as I demonstrated in part 2), which carries the additional overhead of the FILESTREAM abstraction layer. Using OpenSqlFilestream instead will give you lightning-fast BLOB performance. Let’s dive in!

Creating the Database

Before getting started, be sure that FILESTREAM is enabled for remote file system access at both the Windows Service and SQL Server instance levels (as explained in part 2). Then create a FILESTREAM-enabled database as follows (be sure to create the directory, C:\DB in this example, before creating the database):

CREATE DATABASE PhotoLibrary
 ON PRIMARY
  (NAME = PhotoLibrary_data,
   FILENAME = 'C:\DB\PhotoLibrary_data.mdf'),
 FILEGROUP FileStreamGroup CONTAINS FILESTREAM
  (NAME = PhotoLibrary_blobs,
   FILENAME = 'C:\DB\Photos')
 LOG ON
  (NAME = PhotoLibrary_log,
   FILENAME = 'C:\DB\PhotoLibrary_log.ldf')

Next, use the database and create a table for BLOB storage as follows:

USE PhotoLibrary
GO

CREATE TABLE PhotoAlbum(
 PhotoId int PRIMARY KEY,
 RowId uniqueidentifier ROWGUIDCOL NOT NULL UNIQUE DEFAULT NEWSEQUENTIALID(),
 Description varchar(max),
 Photo varbinary(max) FILESTREAM DEFAULT(0x))

In this table, the Photo column is declared as varbinary(max) FILESTREAM, and will hold pictures that will be stored in the file system behind the scenes. This is virtually the same CREATE TABLE statement as shown in my last post (refer to part 2 for a complete explanation of the varbinary(max) FILESTREAM and ROWGUIDCOL columns). The only thing different here is the default value for the BLOB column. The value 0x represents a zero-length binary stream, which is different than NULL. Think of it as the difference between a zero-length string and a null string in .NET; the two are not the same. Similarly, you won’t be able to use OpenSqlFilestream against NULL instances of varbinary(max) FILESTREAM columns, and you’ll soon see why.

Storing BLOBs Using OpenSqlFilestream

I’ll walk you through the complete steps for building a C# data access class in Visual Studio that demonstrates how to use OpenSqlFilestream. In order to keep things simple for demonstration purposes, we won’t be following best practices; namely, we’ll be executing direct T-SQL statements instead of using stored procedures (as you should be using in production code).

Start Visual Studio and create a new Class Library project named PhotoLibraryDAL with a single class file named PhotoData.cs, and add the following public static method to the class named InsertPhoto:

public static void InsertPhoto(int photoId, string desc, string filename)
{
  const string InsertCmd =
   "INSERT INTO PhotoAlbum(PhotoId, Description)" +
   " VALUES(@PhotoId, @Description)";

  using(SqlConnection conn = new SqlConnection(ConnStr))
  {
    conn.Open();

    using(SqlTransaction txn = conn.BeginTransaction())
    {
      using(SqlCommand cmd = new SqlCommand(InsertCmd, conn, txn))
      {
        cmd.Parameters.Add("@PhotoId", SqlDbType.Int).Value = photoId;
        cmd.Parameters.Add("@Description", SqlDbType.VarChar).Value = desc;
        cmd.ExecuteNonQuery();
      }

      SavePhotoFile(photoId, filename, txn);
      txn.Commit();
    }

    conn.Close();
  }
}

Client applications call the InsertPhoto method to insert a new photo into the PhotoAlbum table by passing in a photo ID, description, and filename pointing to a local file containing the actual photo image. It then opens a connection, begins a transaction, and executes an INSERT statement. Notice that the INSERT statement supplies values only for the PhotoId and Description columns. What about the RowId and Photo columns? Because we’ve omitted them, the defaults we established in the table definition are applied. The RowId column gets assigned the next available GUID by the GETSEQUENTIALID function. And for the PhotoId BLOB column, the default 0x (zero-length binary stream) applies.

The result of supplying a zero-length binary stream value for the Photo column is that SQL Server creates an empty file in the file system that is linked to the Photo column of the row just inserted. However, because we’ve begun a transaction, SQL Server has automatically initiated an NTFS file system transaction over the empty file added to the file system. So now we have a new row inserted in the table—but not committed, and we have a new empty file created in the file system—but not committed. This only works because we specified (albeit by default) a zero-length binary stream (0x) rather than NULL when inserting the varbinary(max) FILESTREAM value. If an exception occurs, or if the database transaction rolls back, or if any other condition occurs in which the database transaction doesn’t commit successfully, the NTFS file system transaction will get rolled back automatically as well. In that case, both the inserted row in the table and the empty file in the file system go away.

Now is that “point in time” that we’d like SQL Server to “step aside” so we can call OpenSqlFilestream to store the BLOB for the photo image. To do that, we call the SavePhotoFile method, which is coded as follows:

private static void SavePhotoFile(int photoId, string filename, SqlTransaction txn)
{
  const int BlockSize = 1024 * 512;
  FileStream source = new FileStream(filename, FileMode.Open, FileAccess.Read);
  SafeFileHandle handle = GetOutputFileHandle(photoId, txn);
  using(FileStream dest = new FileStream(handle, FileAccess.Write))
  {
    byte[] buffer = new byte[BlockSize];
    int bytesRead;
    while((bytesRead = source.Read(buffer, 0, buffer.Length)) > 0)
    {
      dest.Write(buffer, 0, bytesRead);
    }
    dest.Close();
  }
  source.Close();
}

There’s actually nothing magical about this method. You can see that it simply streams, in 512K chunks at a time, from a source stream to a destination stream. This is just the way you’d implement a simple, conventional, file-copy routine. Getting a handle on the source stream is also a no-brainer; since the source file is local, you simply invoke the System.IO.FileStream constructor that accepts a filename and enumerations specifying that the file should be opened for read access. So the question then becomes, how do we get a handle on the destination stream, which is a direct channel to the empty and not-yet-committed file on the server’s NTFS file system that SQL Server associates with the Photo column? The answer lies in the GetOutputFileHandle method which we call to obtain a SafeFileHandle object to the destination file:

private static SafeFileHandle GetOutputFileHandle(int photoId, SqlTransaction txn)
{
  const string GetOutputFileInfoCmd =
    "SELECT GET_FILESTREAM_TRANSACTION_CONTEXT(), Photo.PathName()" +
    " FROM PhotoAlbum" +
    " WHERE PhotoId = @PhotoId";

  SqlCommand cmd = new SqlCommand(GetOutputFileInfoCmd, txn.Connection, txn);
  cmd.Parameters.Add("@PhotoId", SqlDbType.Int).Value = photoId;

  string filePath;
  byte[] txnToken;

  using(SqlDataReader rdr = cmd.ExecuteReader(CommandBehavior.SingleRow))
  {
    rdr.Read();
    txnToken = rdr.GetSqlBinary(0).Value;
    filePath = rdr.GetSqlString(1).Value;
    rdr.Close();
  }

  SafeFileHandle handle =
    NativeSqlClient.GetSqlFilestreamHandle
    (filePath, NativeSqlClient.DesiredAccess.ReadWrite, txnToken);

  return handle;
}

This code is at the heart of the matter. In order to call OpenSqlFilestream, we need to obtain two key pieces of information. First, we need a logical path name to the destination file. We obtain that by selecting back the not-yet-committed row we just inserted, and invoking the PathName method on the Photo column. Nobody else can access this row because it hasn’t been committed yet (that would be a dirty read); it will never actually come into existence if the transaction rolls back. But we can read it because we’re inside the transaction, and so we can easily get the path name. No, the path name returned by PathName is not a UNC path to the physical file system on the server, so just lay that security concern to rest. It’s just a bogus path that has meaning only in the context of this transaction which SQL Server can use to map to the real file in the file system. Secondly, we need a token that identifies the NTFS file system transaction that SQL Server initiated behind the scenes, which we obtain with the GET_FILESTREAM_TRANSACTION_CONTEXT function.

Armed with these two key pieces of information, we can call OpenSqlFilestream. Because OpenSqlFilestream is a native code function, I’ve place it in a separate GetSqlFilestreamHandle method in a separate NativeSqlClient class to keep the DllImport and other native code details isolated from our .NET data access class (yes, I’m a neat freak when it comes to code, and you should be too):

using System;
using System.Runtime.InteropServices;

using Microsoft.Win32.SafeHandles;

namespace PhotoLibraryFilestreamDemo
{
  public class NativeSqlClient
  {
    public enum DesiredAccess : uint
    {
      Read,
      Write,
      ReadWrite,
    }

    [DllImport("sqlncli10.dll", SetLastError = true, CharSet = CharSet.Unicode)]
    private static extern SafeFileHandle OpenSqlFilestream(
      string path,
      uint access,
      uint options,
      byte[] txnToken,
      uint txnTokenLength,
      Sql64 allocationSize);

    [StructLayout(LayoutKind.Sequential)]
    private struct Sql64
    {
      public Int64 QuadPart;
      public Sql64(Int64 quadPart)
      {
        this.QuadPart = quadPart;
      }
    }

    public static SafeFileHandle GetSqlFilestreamHandle
     (string filePath, DesiredAccess access, byte[] txnToken)
    {
      SafeFileHandle handle = OpenSqlFilestream(
        filePath,
        (uint)access,
        0,
        txnToken,
        (uint)txnToken.Length,
        new Sql64(0));

      return handle;
    }
  }
}

As you can see, the GetSqlFilestreamHandle method accepts the transaction context token and the path obtained by the GET_FILESTREAM_TRANSACTION_CONTEXT function and PathName method respectively. It also accepts an enumeration that specifies the desired access mode, which can be Read, Write, or ReadWrite. The OpenSqlFilestream function requires other parameters that are not generally applicable for standard FILESTREAM usage, such as the unsigned 32-bit options and 64-bit allocation size arguments. These simply get passed in as 0. The SafeFileHandle returned by OpenSqlFilestream is defined by the .NET framework in the core library assembly mscorlib.dll, so no special reference needs to be set to access this class. Control then gets passed back up the call stack, to the SavePhotoFile method, which calls an overloaded version of the System.IO.Filestream constructor that accepts a SafeFileHandle object. After the source stream is copied entirely to the destination stream (overwriting the empty file on the server), control returns to the InsertPhoto method and the database transaction is finally committed. At that point, both the inserted row and the file in the file system are permanently saved, and the connection is closed. And that’s the way to stream BLOBs into varbinary(max) FILESTREAM columns in SQL Server 2008!

Retrieving BLOBs Using OpenSqlFilestream

Reading FILESTREAM data back out from the database to your application follows a very similar pattern. You open a connection and start a transaction. Note that this is normally not be considered best practice, as you should always try to have your read operations execute outside the context of a transaction. But to implement OpenSqlFilestream for read operations, this is exactly what you do. Then you use OpenSqlFilestream in the very same was we did for writing the BLOB to the database. The exact implementation depends on where you want to stream the content to.

For example, to stream a photo into an Image object for display in a Windows Forms PictureBox control, you can implement a SelectPhoto method like this:

public static Image SelectPhoto(int photoId, out string desc)
{
  const string SelectCmd =
    "SELECT Description, Photo.PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT()" +
    " FROM PhotoAlbum" +
    " WHERE PhotoId = @PhotoId";

  Image photo;

  using(SqlConnection conn = new SqlConnection(ConnStr))
  {
    conn.Open();

    using(SqlTransaction txn = conn.BeginTransaction())
    {
      string filePath;
      byte[] txnToken;

      using(SqlCommand cmd = new SqlCommand(SelectCmd, conn, txn))
      {
        cmd.Parameters.Add("@PhotoId", SqlDbType.Int).Value = photoId;

        using(SqlDataReader rdr = cmd.ExecuteReader(CommandBehavior.SingleRow))
        {
          rdr.Read();
          desc = rdr.GetSqlString(0).Value;
          filePath = rdr.GetSqlString(1).Value;
          txnToken = rdr.GetSqlBinary(2).Value;
          rdr.Close();
        }
      }

      photo = LoadPhotoImage(filePath, txnToken);

      txn.Commit();
    }

    conn.Close();
  }

  return photo;
}

private static Image LoadPhotoImage(string filePath, byte[] txnToken)
{
  Image photo;

  SafeFileHandle handle =
    NativeSqlClient.GetSqlFilestreamHandle
     (filePath, NativeSqlClient.DesiredAccess.Read, txnToken);

  using(FileStream fs = new FileStream(handle, FileAccess.Read))
  {
    photo = Image.FromStream(fs);
    fs.Close();
  }

  return photo;
}

Because the Image class has a static FromStream method that consumes the stream passed as a parameter to the method, this code essentially firehouses the stream at the fastest possible speed directly from the file system on SQL Server into the Image object.

Or, to stream a photo over HTTP from an ASP.NET service:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.IO;
using Microsoft.Win32.SafeHandles;

namespace PhotoLibraryHttpService
{
 public partial class PhotoService : System.Web.UI.Page
 {
  private const string ConnStr =
    "Data Source=.;Integrated Security=True;Initial Catalog=PhotoLibrary;";

  protected void Page_Load(object sender, EventArgs e)
  {
   int photoId = Convert.ToInt32(Request.QueryString["photoId"]);
   if (photoId == 0)
   {
    return;
   }

   const string SelectCmd =
    "SELECT Photo.PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT()" +
    " FROM PhotoAlbum" +
    " WHERE PhotoId = @PhotoId";

   using (SqlConnection conn = new SqlConnection(ConnStr))
   {
    conn.Open();

    using (SqlTransaction txn = conn.BeginTransaction())
    {
     string filePath;
     byte[] txnToken;

     using (SqlCommand cmd = new SqlCommand(SelectCmd, conn, txn))
     {
      cmd.Parameters.Add("@PhotoId", SqlDbType.Int).Value = photoId;

      using (SqlDataReader rdr = cmd.ExecuteReader(CommandBehavior.SingleRow))
      {
       rdr.Read();
       filePath = rdr.GetSqlString(0).Value;
       txnToken = rdr.GetSqlBinary(1).Value;
       rdr.Close();
      }
     }

     this.StreamPhotoImage(filePath, txnToken);

     txn.Commit();
    }

    conn.Close();
   }
  }

  private void StreamPhotoImage(string filePath, byte[] txnToken)
  {
   const int BlockSize = 1024 * 512;
   const string JpegContentType = "image/jpeg";

   SafeFileHandle handle =
     NativeSqlClient.GetSqlFilestreamHandle
      (filePath, NativeSqlClient.DesiredAccess.Read, txnToken);

   using (FileStream source = new FileStream(handle, FileAccess.Read))
   {
    byte[] buffer = new byte[BlockSize];
    int bytesRead;
    Response.BufferOutput = false;
    Response.ContentType = JpegContentType;
    while ((bytesRead = source.Read(buffer, 0, buffer.Length)) > 0)
    {
     Response.OutputStream.Write(buffer, 0, bytesRead);
     Response.Flush();
    }
    source.Close();
   }
  }
  }
}

Have Fun with FILESTREAM!

This concludes my 3-part series on using FILESTREAM in SQL Server 2008, which explains everything you need to know to get the most out of this truly awesome new feature. I hope you enjoyed it, and look forward to hearing your FILESTREAM success stories!

SQL Server 2008 FILESTREAM Part 2 of 3: Enabling and Using FILESTREAM

This is the second installment in a 3-post series covering the new FILESTREAM feature in SQL Server 2008. In part 1, I explained how FILESTREAM works at a high level, and in this post I’ll show you the step-by-step procedure for enabling and using this awesome new feature.

Enabling FILESTREAM

You need to enable FILESTREAM before you can use it, and that’s true of most “extra” features—which are disabled until you need them, as part of the overall “secure by default” strategy in SQL Server. Enabling FILESTREAM is slightly more involved than enabling other features because it involves two steps:

Enabling FILESTREAM for the Windows Service

First you need to enable FILESTREAM at the Windows service level. You can perform this step during the SQL Server 2008 setup process, or any time after SQL Server 2008 is installed by using the SQL Server Configuration Manager (click the Start menu and choose All Programs > Microsoft SQL Server 2008 > Configuration Tools > SQL Server Configuration Manager). The Properties dialog box for the MSSQLSERVER process has a FILESTREAM tab that lets you choose between four FILESTREAM access levels.

 Enabling FILESTREAM for Windows Service

1) Disabled: With all three checkboxes cleared, FILESTREAM is completely disabled. This is the default setting.

2) Enabled for T-SQL only: Checking the first checkbox enables FILESTREAM in “total transparency” mode. This means that you’ll be able to treat varbinary(max) columns decorated with the FILESTREAM attribute as conventional varbinary(max) columns that appear to be “in-line” columns; either by in-lining binary streams using T-SQL, or by embedding and extracting BLOBs as byte arrays to and from your tables.

3) Enabled for T-SQL and file streaming: Checking the second checkbox also enables direct streaming between the file system and your application running on the same machine as SQL Server. With this option selected, a share name is also specified to enable direct streaming using the OpenSqlFilestream function exposed by the SQL Server 2008 Native Client API, which I’ll cover in my third and final FILESTREAM post.

4) Enabled for T-SQL and file streaming to remote clients: This is the same as the previous option, but allows applications running on remote machines to use OpenSqlFilestream. Since applications rarely run on the same box as SQL Server, you’ll typically always choose this option if you’ll be using OpenSqlFilestream (which, again, I’ll cover in my next post).

Enabling FILESTREAM for the SQL Server instance

Next you need to enable FILESTREAM at the SQL Server instance level. Basically, there is a separation of security concerns here between the Windows and database administrators, and the same access level set for the Windows service needs to be set for the SQL Server instance. That’s easy if one person wears both hats, but otherwise, these guys need to get along and they need to agree.

You can use SQL Server Management Studio (SSMS) perform this step from the Advanced page in the Server Properties dialog:

Enabling FILESTREAM for SQL Server Instance

Although there are only three options in this dropdown, they are essentially the same options I described for the Windows service configuration; where “Full access enabled” means direct streamed access to the file system from local clients only, or both local and remote clients, depending on the corresponding setting made at the Windows service level.

As an alternative to using SSMS, you can programmatically set this access level by using the following T-SQL statements (specify n as 0, 1, or 2 for disabled, T-SQL access enabled, full access enabled, respectively):

EXEC sp_configure filestream_access_level, n
RECONFIGURE

Creating a FILESTREAM-Enabled Database

Once you’ve enabled FILESTREAM, you can create a FILESTREAM-enabled database by using the new FILEGROUP…CONTAINS FILESTREAM clause, such as the one in this otherwise ordinary CREATE DATABASE statement:

CREATE DATABASE PhotoLibrary
 ON PRIMARY
  (NAME = PhotoLibrary_data,
   FILENAME = 'C:\DB\PhotoLibrary_data.mdf'),
 FILEGROUP FileStreamGroup CONTAINS FILESTREAM
  (NAME = PhotoLibrary_blobs,
   FILENAME = 'C:\DB\Photos')
 LOG ON
  (NAME = PhotoLibrary_log,
   FILENAME = 'C:\DB\PhotoLibrary_log.ldf')

When you execute this statement, SQL Server will create the PhotoLibrary_data.mdf and PhotoLibrary_log.ldf files as usual, and will also create a Photos folder inside of which it will transparently store individual files for each BLOB stored in varbinary(max) FILESTREAM columns in the PhotoLibrary database.

Creating and Using FILESTREAM Columns

As I already mentioned, you only need to append the FILESTREAM attribute to the varbinary(max) data type on the column(s) you want SQL Server to store in the file system. But as it turns out, there’s an additional requirement to include a single column of type uniqueidentifier (GUID) decorated with the ROWGUIDCOL attribute. This column must not allow nulls, and needs to either be the primary key or have a unique constraint enforced on it. SQL Server absolutely requires this ROWGUIDCOL column in order to transparently link BLOBs in the file system with their corresponding rows in the database tables.

Only one ROWGUIDCOL-attributed column is permitted in any table, but once that column is defined, you can then define any number of varbinary(max) FILESTREAM columns for that table that you’d like. Furthermore, by establishing a default value for that column to automatically generate GUID values using either the GETID or GETSEQUENTIALID functions, you can satisfy the requirement to include this column without incurring any additional burden of maintaining it.

The following statement creates a table in the PhotoLibrary database that uses FILESTREAM to store BLOBs of images in a varbinary(max) FILESTREAM column named Photo:

CREATE TABLE PhotoAlbum(
 PhotoId int PRIMARY KEY,
 RowId uniqueidentifier ROWGUIDCOL NOT NULL UNIQUE DEFAULT NEWSEQUENTIALID(),
 Description varchar(max),
 Photo varbinary(max) FILESTREAM)

At this point, you can treat the Photo column as an ordinary varbinary(max) column for BLOB storage. This means you can embed or extract byte arrays directly to and from this column from your client application exactly as you did before. Or, for BLOBs that aren’t too large, you can even express the binary stream in T-SQL as follows:

INSERT INTO PhotoAlbum(PhotoId, Description, Photo)
 VALUES(2, 'Document icon', 0x4749463839610C000E00B30000FFFFFFC6DEC6C0C0C000008
  0000000D3121200000000000000000000000000000000000000000000000000000000000021F9
  0401000002002C000000000C000E0000042C90C8398525206B202F1820C80584806D1975A29AF
  48530870D2CEDC2B1CBB6332EDE35D9CB27DCA554484204003B)

This statement appears to insert a BLOB directly into the table’s Photo column, in-line with the other columns (PhotoId and Description). And it certainly appears the same way when retrieving the row back out of the table:

SELECT PhotoId, Description, Photo
 FROM PhotoAlbum
 WHERE PhotoId = 2

FILESTREAM Query Result

Peeking Behind the Scenes

As is often the case with powerful abstraction layers, the result is somewhat anti-climactic, since everything appears to work the same as it did in earlier versions of SQL Server using the varbinary(max) data type. However, if we start exploring under the covers, we can see the effect that FILESTREAM has on the file system behind the scenes.

But first, understand that normally “thou shalt not” do what we’re about to do; that is, interact directly with the file system. The manner in which SQL Server decides to create and organize subdirectories and files in the folder designated as the FILESTREAM file group is none of our business, just as the internal structure and content of .mdf and .ldf files are none of our business—it’s all managed exclusively by SQL Server. Accepting that, we can still dig into the file system to prove that things are working behind the scenes as expected.

Using Windows Explorer, navigate to the C:\DB\Photos folder created by SQL Server for FILESTREAM storage. The directory names and structure are certainly not intuitive, nor will they necessarily match the ones on my system. But if you start peeking around, you’ll encounter a file whose date/time coincides with the date/time that we inserted the row:

FILESTREAM in the File System

Knowing that the BLOB we inserted is an image, right-click on the file, choose Open, and then select Windows Picture and Fax Viewer to display the image (which I’ve magnified in the following screenshot):

FILESTREAM File

If you really want to prove the point (and let me warn you again, this is a real no-no), you can even replace this file with some other image and verify that the SELECT statement returns a different BLOB. It should also be fairly obvious that you’d want to lock down the folder designated for FILESTREAM storage just as you lock down the .mdf and .ldf files, so that only SQL Server itself has direct access to the file system.

As explained in part 1, FILESTREAM delivers full transactional integrity over BLOB data in the file system. If you wrap the above INSERT statement inside a transaction, then SQL Server will automatically initiate an NTFS file system transaction over the file added to the file system behind the scenes. In this case, rolling back the database transaction (or any other condition in which the database transaction does not commit successfully) automatically rolls back the NTFS file system transaction so that both the row and the file disappear.

Direct File System Access

Hopefully, you’re reasonably impressed with FILESTREAM thus far. But there’s a much better way to stream content into and out of varbinary(max) FILESTREAM columns. The fact remains that the varbinary(max) data type is not optimized for streaming, even though FILESTREAM uses the file system instead of your relational tables to persist BLOBs. How are you going to achieve the maximum BLOB access performance? By streaming directly into and out of the file system, which is a native environment optimized for streaming. You can do that by using the OpenSqlFilestream function exposed by the SQL Server Native Client API, which I’ll cover in the third and last post. So stay tuned!

SQL Server 2008 FILESTREAM Part 1 of 3: Introducing FILESTREAM

I’ve just returned from VSLive! Orlando and gave a great full day workshop on SQL Server 2008 R2 with Andrew Brust. Once again, I find that FILESTREAM stands out as one of my favorite new features in the relational database engine. So, I’ve decided to post a 3-part series on this awesome new feature. In this first post, I’ll explain FILESTREAM and how it works. In part two, I’ll walk you through the steps to enable and use FILESTREAM. In my third and last post of this series, I’ll show you how to maximize FILESTREAM performance using the OpenSqlFilestream feature exposed by the SQL Server Native Client API.

The Need To Stream

Storing unstructured BLOB (large binary object) data that is somehow associated with structured data in a relational database is certainly not a new problem. But these days, the proliferation of unstructured content is accelerating rapidly, and increasingly, we find ourselves dealing with more and more Word documents, Excel spreadsheets, PDF files, emails with attachments, audio/video files, and the list goes on and on. There has been no native solution in SQL Server for handling BLOB storage, until now.

Before FILESTREAM in SQL Server 2008, we’ve had basically two choices for handling BLOBs: Store them in the database as varbinary(max) columns (or, in older versions of SQL Server, using the image data type), or store them outside the database (typically, in the file system or dedicated BLOB store). Both of these approaches have advantages and disadvantages, making neither of them ideal.

Pros and Cons of Traditional BLOB Storage Techniques

Storing BLOBs in the database is fine if your BLOB requirements are modest. Just by using the varbinary(max) data type, you’ll be able to easily embed and extract byte arrays representing BLOBs directly into and out of your database tables. You’ll also enjoy transactional integrity over BLOB updates, as well as integrated management benefits (for example, a single database backup includes the BLOB data). However, if you have many BLOBs, and/or they are very large, then a disproportionate amount of disk space allocated to your structured relational tables will be consumed by BLOB data. Query performance will suffer as a result, and in most cases the performance hit will be so bad that you’ll need to consider a different option.

To maximize database performance, you can instead store the BLOBs outside the database. Files belong in the file system, right? Structured relational data belongs in tables, right? Right! And if you’ve ever had to build a system that associates files in the file system with rows in database tables, you know the pain involved in implementing this solution. Because SQL Server has no awareness of the coupling between the table data and the BLOB data in the file system, it’s your job to establish and maintain those references. When a row is deleted from the table, your application needs to determine which files in the file system store data for that deleted row, and then delete those files as well. You also lose transactional scope, since rolling back a database transaction will in no way undo changes made to the file system. It’s now also another administrative burden to back up your data, since a database backup is no longer a “complete” backup of your database, without you also performing a separate file system backup for the BLOB data.

Enter: FILESTREAM

In SQL Server 2008, the new FILESTREAM feature gives us the best of both worlds. The beauty of FILESTREAM is its transparency. Contrary to what you may have heard or read, FILESTREAM is not a new data type in SQL Server 2008. Rather, it is an attribute that you apply to the varbinary(max) data type–which (as mentioned) is the same data type traditionally used to store BLOBs directly in your tables. From your perspective, a varbinary(max) FILESTREAM column is no different than a conventional varbinary(max) column. So you can embed/extract byte arrays to/from the varbinary(max) column, or even inline small binary streams in T-SQL, just as you did before. Updates made to BLOB data in a varbinary(max) FILESTREAM column while inside a database transaction will roll back when the database transaction rolls back. And a single database backup protects the entire database, including its BLOB data.

The magic? Behind the scenes, simply applying the FILESTREAM attribute to the varbinary(max) data type for defining a BLOB columns enables a highly efficient storage abstraction layer over that varbinary(max) column.

Specifically, SQL Server will store and retrieve BLOB data assigned to varbinary(max) FILESTREAM columns to and from the file system behind the scenes automatically.

How Does FILESTREAM Work?

FILESTREAM works by integrating with the NTFS file system. As you’ll see in part 2 of this blog post series, FILESTREAM storage for a database is defined in the CREATE DATABASE statement simply by declaring another file group that points to a physical location in the file system. Because it’s just another file group, SQL Server now considers the data in the file system to be an integral part of the overall database. So, unless you explicitly exclude the file group when backing up the database, the BLOB data in the file system will be included in your normal full and incremental database backups.

Because SQL Server is now intimately involved with the file system behind the scenes, you enjoy the performance benefit of keeping BLOB storage outside your structured table file groups, without worrying about programmatically maintaining the link references between rows in tables and corresponding files in the file system. And because the NTFS file system is a transactional file system, SQL Server is able to transactionalize BLOB updates made to the file system. When you initiate a database transaction and then proceed to update BLOB data stored in varbinary(max) FILESTREAM columns, SQL Server will initiate an NTFS transaction over the file system. When you commit or roll back your database transaction, SQL Server will then commit or roll back the NTFS file system transaction for you automatically.

If all of this is starting to get you excited, then that’s the point! Be sure to stay tuned for part 2, which will show you how to enable and use FILESTREAM, and part 3, which explores the OpenSqlFilestream feature that delivers ultra-high streaming performance against varbinary(max) FILESTREAM columns.