Using T4 Templates to Generate C# Enums from SQL Server Database Tables

When building applications with C# and SQL Server, it is often necessary to define codes in the database that correspond with enums in the application. However, it can be burdensome to maintain the enums manually, as codes are added, modified, and deleted in the database.

In this blog post, I’ll share a T4 template that I wrote which does this automatically. I looked online first, and did find a few different solutions for this, but none that worked for me as-is. So, I built this generic T4 template to do the job, and you can use it too.

Let’s say you’ve got a Color table and ErrorType table in the database, populated as follows:

Now you’d like enums in your C# application that correspond to these rows. The T4 template will generate them as follows:

Before showing the code that generates this, let’s point out a few things.

First, notice that the Color table has only an ID and a name column, while the ErrorType table has an ID, name, and description columns. Thus, the ErrorType enums include XML comments based on the descriptions in the database, which appear in IntelliSense when referencing the enums in Visual Studio.

Also notice that one enum was generated for the Color table, while two enums were generated for the ErrorType table. This is because we instructed the T4 template to create different enums for different value ranges (0 – 999 for SystemErrorType, and 1000 – 1999 for CustomerErrorType) in the same table.

Finally, the Color enum includes an additional member Undefined with a value of 0 that does not appear in the database, while the other enums don’t include the Undefined member. This is because, in the case of Color, we’d like an enum to represent Undefined, but we don’t want it in the database because we don’t want to allow 0 as a foreign key into the Color table.

This gives you an idea of the flexibility that this T4 template gives you for generating enums from the database. The enums above were generated with the following T4 template:

<#@include file="EnumsDb.ttinclude" #>
<#
  var configFilePath = "app.config";

  var enums = new []
  {
    new EnumEntry
      ("Supported colors", "DemoDatabase", "dbo", "Color", "ColorId", "ColorName", "ColorName")
      { GenerateUndefinedMember = true },

    new EnumEntry
      ("System error types", "DemoDatabase", "dbo", "ErrorType", "ErrorTypeId", "ErrorTypeName", "Description")
      { EnumName = "SystemErrorType", ValueRange = new Tuple<long, long>(0, 999) },

    new EnumEntry
      ("Customer error types", "DemoDatabase", "dbo", "ErrorType", "ErrorTypeId", "ErrorTypeName", "Description")
      { EnumName = "CustomerErrorType", ValueRange = new Tuple<long, long>(1000, 1999) },
  };

  var code = this.GenerateEnums(configFilePath, enums);

  return code;
#>

You can see that there is very little code that you need to write in the template, and that’s because all the real work is contained inside the EnumsDb.ttinclude file (referenced at the top with @include), which contains the actual code to generate the enums, and can be shared throughout your application. A complete listing of the EnumsDb.ttinclude file appears at the end of this blog post.

All you need to do is call GenerateEnums with two parameters. The first is a path your application’s configuration file, which holds the database connection string(s). The second is an array of EnumEntry objects, which drives the creation of enums in C#. Specifically, in this case, there are three EnumEntry objects, which results in the three enums that were generated.

Each enum entry contains the following properties:

  1. Description (e.g., “Supported colors”). This description appears as XML comments for the enum.
  2. Database connection string key (e.g., “DemoDatabase”). It is expected that a connection string named by this key is defined in the application’s configuration file. Since this property exists for each enum entry, it is entirely possible to generate different enums from different databases.
  3. Schema name (e.g., “dbo”). This is the schema that the table is defined in.
  4. Table name (e.g., “Color”). This is the table that contains rows of data for each enum member.
  5. ID column name (e.g., “ColorId”). This is the column that contains the numeric value for each enum member. It must be an integer type, but it can be an integer of any size. The generator automatically maps tinyint to byte, int to int, smallint to short, and bigint to long.
  6. Member column name (e.g., “ColorName”). This is the column that contains the actual name of the enum member.
  7. Description column name. This should be the same as the Member column name (e.g., “ColorName”) if there is no description, or it can reference another column containing the description if one exists. If there is a description, then it is generated as XML comments above each enum member.
  8. Optional:
    1. EnumName. If not specified, the enum is named after the table. Otherwise, you can choose another name if you don’t want the enum named after the table name.
    2. GenerateUndefinedMember. If specified, then the template will generate an Undefined member with a value of 0.
    3. ValueRange. If specified, then enum members are generated only for rows in the database that fall within the specified range of values. The ErrorType table uses this technique to generate two enums from the table; one named SystemErrorType with values 0 – 999, and another named CustomerErrorType with values 1000 – 1999.

This design offers a great deal of flexibility, while constantly ensuring that the enums defined in your C# application are always in sync with the values defined in the database. And of course, you can modify the template as desired to suit any additional needs that are unique to your application.

The full template code is listed below. I hope you enjoy using it as much as I enjoyed writing it. Happy coding! 😊

<#@ template hostspecific="true" language="C#" debug="false" #>
<#@ assembly name="System.Configuration" #>
<#@ assembly name="System.Data" #>
<#@ import namespace="System.Configuration" #>
<#@ import namespace="System.Data" #>
<#@ import namespace="System.Data.SqlClient" #>
<#@ import namespace="System.Text" #>
<#+
  private string GenerateEnums(string configFilePath, EnumEntry[] entries)
  {
    if (entries == null)
    {
      return string.Empty;
    }

    var ns = System.Runtime.Remoting.Messaging.CallContext.LogicalGetData("NamespaceHint");
    var sb = new StringBuilder();
    sb.AppendLine(this.GenerateHeaderComments());
    sb.AppendLine($"namespace {ns}");
    sb.AppendLine("{");
    foreach(var entry in entries)
    {
      try
      {
        sb.Append(this.GenerateEnumMembers(configFilePath, entry));
      }
      catch (Exception ex)
      {
        sb.AppendLine($"#warning Error generating enums for {entry.EnumDescription}");
        sb.AppendLine($"  // Message: {ex.Message}");
      }
    }
    sb.AppendLine();
    sb.AppendLine("}");
    return sb.ToString();
  }

  private string GenerateHeaderComments()
  {
    var comments = $@"
// ------------------------------------------------------------------------------------------------
// <auto-generated>
//  This code was generated by a C# code generator
//  Generated at {DateTime.Now}
// 
//  Warning: Do not make changes directly to this file; they will get overwritten on the next
//  code generation.
// </auto-generated>
// ------------------------------------------------------------------------------------------------
    ";

    return comments;
  }

  private string GenerateEnumMembers(string configFilePath, EnumEntry entry)
  {
    var code = new StringBuilder();
    var connStr = this.GetConnectionString(configFilePath, entry.ConnectionStringKey);
    var enumDataType = default(string);
    using (var conn = new SqlConnection(connStr))
    {
      conn.Open();
      using (var cmd = conn.CreateCommand())
      {
        cmd.CommandText = @"
          SELECT DATA_TYPE
          FROM INFORMATION_SCHEMA.COLUMNS
          WHERE TABLE_SCHEMA = @SchemaName AND TABLE_NAME = @TableName AND COLUMN_NAME = @IdColumnName
        ";

        cmd.Parameters.AddWithValue("@SchemaName", entry.SchemaName);
        cmd.Parameters.AddWithValue("@TableName", entry.TableName);
        cmd.Parameters.AddWithValue("@IdColumnName", entry.IdColumnName);

        var sqlDataType = cmd.ExecuteScalar();
        if(sqlDataType == null)
        {
          throw new Exception($"Could not discover ID column [{entry.IdColumnName}] data type for enum table [{entry.SchemaName}].[{entry.TableName}]");
        }
        enumDataType = this.GetEnumDataType(sqlDataType.ToString());

        var whereClause = string.Empty;
        if (entry.ValueRange != null)
        {
          whereClause = $"WHERE [{entry.IdColumnName}] BETWEEN {entry.ValueRange.Item1} AND {entry.ValueRange.Item2}";
        }

        cmd.CommandText = $@"
          SELECT
            Id = [{entry.IdColumnName}],
            Name = [{entry.NameColumnName}],
            Description = [{entry.DescriptionColumnName}]
          FROM
            [{entry.SchemaName}].[{entry.TableName}]
          {whereClause}
          ORDER BY
            [{entry.IdColumnName}]
        ";
        cmd.Parameters.Clear();

        var innerCode = new StringBuilder();
        var hasUndefinedMember = false;
        using (var rdr = cmd.ExecuteReader())
        {
          while (rdr.Read())
          {
            if (rdr["Name"].ToString() == "Undefined" || rdr["Id"].ToString() == "0")
            {
              hasUndefinedMember = true;
            }
            if (entry.NameColumnName != entry.DescriptionColumnName)
            {
              innerCode.AppendLine("\t\t/// <summary>");
              innerCode.AppendLine($"\t\t/// {rdr["Description"]}");
              innerCode.AppendLine("\t\t/// </summary>");
            }
            innerCode.AppendLine($"\t\t{rdr["Name"].ToString().Replace("-", "_")} = {rdr["Id"]},");
          }
          rdr.Close();
        }
        if ((entry.GenerateUndefinedMember) && (!hasUndefinedMember))
        {
          var undefined = new StringBuilder();

          undefined.AppendLine("\t\t/// <summary>");
          undefined.AppendLine("\t\t/// Undefined (not mapped in database)");
          undefined.AppendLine("\t\t/// </summary>");
          undefined.AppendLine("\t\tUndefined = 0,");

          innerCode.Insert(0, undefined);
        }
        code.Append(innerCode.ToString());
      }
      conn.Close();
    }

    var final = $@"
  /// <summary>
  /// {entry.EnumDescription}
  /// </summary>
  public enum {entry.EnumName} : {enumDataType}
  {{
    // Database information:
    //  {entry.DbInfo}
    //  ConnectionString: {connStr}

{code}
  }}
    ";

    return final;
  }

  private string GetConnectionString(string configFilePath, string key)
  {
    var map = new ExeConfigurationFileMap();
    map.ExeConfigFilename = this.Host.ResolvePath(configFilePath);
    var config = ConfigurationManager.OpenMappedExeConfiguration(map, ConfigurationUserLevel.None);
    var connectionString = config.ConnectionStrings.ConnectionStrings[key].ConnectionString;

    return connectionString;
  }

  private string GetEnumDataType(string sqlDataType)
  {
    var enumDataType = default(string);
    switch (sqlDataType.ToString())
    {
      case "tinyint"  : return "byte";
      case "smallint" : return "short";
      case "int"      : return "int";
      case "bigint"   : return "long";
      default :
        throw new Exception($"SQL data type {sqlDataType} is not valid as an enum ID column");
    }
  }

  private class EnumEntry
  {
    public string EnumDescription { get; }
    public string ConnectionStringKey { get; }
    public string SchemaName { get; }
    public string TableName { get; }
    public string IdColumnName { get; }
    public string NameColumnName { get; }
    public string DescriptionColumnName { get; }

    public string EnumName { get; set; }
    public Tuple<long, long> ValueRange { get; set; }
    public bool GenerateUndefinedMember { get; set; }

    public string DbInfo
    {
      get
      {
        var info = $"ConnectionStringKey: {this.ConnectionStringKey}; Table: [{this.SchemaName}].[{this.TableName}]; IdColumn: [{this.IdColumnName}]; NameColumn: [{this.NameColumnName}]; DescriptionColumn: [{this.DescriptionColumnName}]";
        if (this.ValueRange != null)
        {
          info = $"{info}; Range: {this.ValueRange.Item1} to {this.ValueRange.Item2}";
        }
        return info;
      }
    }

    public EnumEntry(
      string enumDescription,
      string connectionStringKey,
      string schemaName,
      string tableName,
      string idColumnName = null,
      string nameColumnName = null,
      string descriptionColumnName = null
    )
    {
      this.EnumDescription = enumDescription;
      this.ConnectionStringKey = connectionStringKey;
      this.SchemaName = schemaName;
      this.TableName = tableName;
      this.IdColumnName = idColumnName ?? tableName + "Id";
      this.NameColumnName = nameColumnName ?? "Name";
      this.DescriptionColumnName = descriptionColumnName ?? "Description";
      this.EnumName = tableName;
    }

  }
#>

Creating a Helpful Extension Method for XAML Visibility

Used recklessly, extension methods are potentially evil. But judiciously applied, they can be extremely helpful. In this short post (well, short for me), I’ll show you how to create an extension method for the bool class that will simplify your .NET code in XAML-based apps (either WPF or Silverlight). In particular, this extension method addresses the fact that the Visibility property of UI controls in XAML is not a Boolean (true/false) value (as has traditionally been the case for both Windows Forms and ASP.NET). Instead, it’s one of three possible enumerated constant values: Hidden, Collapsed, and Visible.

Visible obviously means “visible,” while both Hidden and Collapsed mean “invisible.” The difference between Hidden and Collapsed is merely whether or not the invisible control occupies blank space on the screen that it would occupy if it were visible. Collapsed consumes no space, while Hidden does.

It’s nice to have the three options, but in most cases you’ll find that you just need the two options Visible and Collapsed. If you’re setting visibility to XAML controls using these two enums, you’ve probably noticed that it’s just not as clean as it is with Windows Forms or ASP.NET controls. You typically already have a Boolean value—either as a simple variable, or an expression—that determines whether the control should be visible or not. You could use that value to enable or disable the control, as the IsEnabled property is also a Boolean, but you can’t assign it to the Visibility property because that property expects one of the three Visibility enumerations—not a Boolean. That’s frustrating, because from a UX (user experience) perspective, where these concepts are used to convey “availability” to the user, hiding/showing controls versus enabling/disabling them is a subtlety of UI design. So developers should be able to think (and code) freely in terms of Booleans for both IsEnabled and Visibility.

Consider the C# code to manipulate the Visibility and IsEnabled properties of a button control:

OpenCustomerButton.IsEnabled = false;
OpenCustomerButton.IsEnabled = true;
OpenCustomerButton.Visibility = Visibility.Collapsed;
OpenCustomerButton.Visibility = Visibility.Visible;

bool isAvailable = true;
OpenCustomerButton.IsEnabled = isAvailable;
OpenCustomerButton.Visibility = (isAvailable ? Visibility.Collapsed : Visibility.Visible);

// Set the button's visibility according another button's enabled/disabled state
OpenCustomerButton.Visibility = (ManageCustomersButton.IsEnabled ? Visibility.Visible : Visibility.Collapsed);

You can see that because Visibility takes the enum rather than a bool, it’s harder to work with than IsEnabled, and just isn’t as neat. But we can do something about this. How? By writing a simple (one-line!) extension method that adds a ToVisibility method to the bool class:

public static class BoolExtensions
{
  public static Visibility ToVisibility(this bool isVisible)
  {
    return (isVisible ? Visibility.Visible : Visibility.Collapsed);
  }
}

Now the same UI code is much easier to read and write:

OpenCustomerButton.IsEnabled = false;
OpenCustomerButton.IsEnabled = true;
OpenCustomerButton.Visibility = false.ToVisibility();
OpenCustomerButton.Visibility = true.ToVisibility();

bool isAvailable = true;
OpenCustomerButton.IsEnabled = isAvailable;
OpenCustomerButton.Visibility = isAvailable.ToVisibility();

// Set the button's visibility according to another button's enabled/disabled state
OpenCustomerButton.Visibility = ManageCustomersButton.IsEnabled.ToVisibility();

Like? Me too. Neatness counts!

It would be even better if we could instead extend every UIElement with a Boolean property called IsVisible, and embed the conversion logic bi-directionally in the property’s getter and setter. But, unfortunately, you cannot create extension properties in .NET, only extension methods. So extending bool with a ToVisibility method is the next best thing.

One more tip: Put the BoolExtensions class inside a common code namespace that all your XAML classes import, so ToVisibility is always available without needing to add an extra using statement.

Remember, abusing extension methods quickly leads to messy, buggy code. Instead, find appropriate uses for extension methods that yield cleaner, tighter code. That is all. 🙂

Using SqlFileStream in C# to Access SQL Server FILESTREAM Data

FILESTREAM is a powerful feature in SQL Server that stores varbinary(max) column data (BLOBs) in the file system (where BLOBs belong) rather than in the database’s structured file groups (where BLOBs kill performance). This feature was first introduced in SQL Server 2008, and is now being expanded with the new FileTable feature coming in SQL Server 2012 (code-named “Denali”), which I’ll cover in a near-future post (as well as in my Programming Microsoft SQL Server 2012 book, which I’m writing now).

If you’re not already familiar with FILESTREAM, you can get the necessary background by reading my two earlier blog posts: Introducing FILESTREAM and Enabling and Using FILESTREAM. In this post, I’ll show you how to use the SqlFileStream class to achieve high-performance streaming of SQL Server FILESTREAM data in your C# applications. This is a simplified version of the OpenSqlFilestream code I presented in my Using the OpenSqlFilestream API blog post.

What Is SqlFileStream?

SqlFileStream is a class in the .NET Framework (.NET 3.5 SP1 and higher) that wraps the OpenSqlFilestream function exposed by the SQL Server Native Client API. This lets you stream BLOBs directly between SQL Server and your .NET application written in C#. SqlFileStream is always used within a transaction, just at the point that you want to store and retrieve BLOBs to and from varbinary(max) FILESTREAM columns. At that point in time, SQL Server will “step aside” and let you stream directly against the server’s NTFS file system—a native environment optimized for streaming. This provides you with a streaming “tunnel” between your application and SQL Server’s internally-managed file system. Using SqlFileStream will give your application lightning-fast BLOB performance. Let’s dive in!

Creating the Database

Before getting started, be sure that FILESTREAM is enabled for remote file system access at both the Windows Service and SQL Server instance levels (as explained in Enabling and Using FILESTREAM). Then create a FILESTREAM-enabled database as follows (be sure to create the directory, C:\DB in this example, before creating the database):

CREATE DATABASE PhotoLibrary
 ON PRIMARY
  (NAME = PhotoLibrary_data,
   FILENAME = 'C:\DB\PhotoLibrary_data.mdf'),
 FILEGROUP FileStreamGroup CONTAINS FILESTREAM
  (NAME = PhotoLibrary_blobs,
   FILENAME = 'C:\DB\Photos')
 LOG ON
  (NAME = PhotoLibrary_log,
   FILENAME = 'C:\DB\PhotoLibrary_log.ldf')

Next, use the database and create a table for BLOB storage as follows:

USE PhotoLibrary
GO

CREATE TABLE PhotoAlbum(
 PhotoId int PRIMARY KEY,
 RowId uniqueidentifier ROWGUIDCOL NOT NULL UNIQUE DEFAULT NEWID(),
 Description varchar(max),
 Photo varbinary(max) FILESTREAM DEFAULT(0x))

In this table, the Photo column is declared as varbinary(max) FILESTREAM, and will hold pictures that will be stored in the file system behind the scenes. (Refer to Enabling and Using FILESTREAM for a complete explanation of the varbinary(max) FILESTREAM and ROWGUIDCOL columns.) Notice the default value we’ve established for the BLOB column. The value 0x represents a zero-length binary stream, which is different than NULL. Think of it as the difference between a zero-length string and a null string in .NET; the two are not the same. Similarly, you won’t be able to use SqlFileStream against NULL instances of varbinary(max) FILESTREAM columns, and you’ll soon see why.

Writing SqlFileStream Code

Start Visual Studio, create a new Class Library project, and add a PhotoData class as follows:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.Drawing;
using System.IO;
using System.Transactions;

namespace PhotoLibraryApp
{
  public class PhotoData
  {
    private const string ConnStr =
      "Data Source=.;Integrated Security=True;Initial Catalog=PhotoLibrary;";

    public static void InsertPhoto
      (int photoId, string desc, string filename)
    {
      const string InsertTSql = @"
        INSERT INTO PhotoAlbum(PhotoId, Description)
          VALUES(@PhotoId, @Description);
        SELECT Photo.PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT()
          FROM PhotoAlbum
          WHERE PhotoId = @PhotoId";

      string serverPath;
      byte[] serverTxn;

      using (TransactionScope ts = new TransactionScope())
      {
        using (SqlConnection conn = new SqlConnection(ConnStr))
        {
          conn.Open();

          using (SqlCommand cmd = new SqlCommand(InsertTSql, conn))
          {
            cmd.Parameters.Add("@PhotoId", SqlDbType.Int).Value = photoId;
            cmd.Parameters.Add("@Description", SqlDbType.VarChar).Value = desc;
            using (SqlDataReader rdr = cmd.ExecuteReader())
            {
              rdr.Read();
              serverPath = rdr.GetSqlString(0).Value;
              serverTxn = rdr.GetSqlBinary(1).Value;
              rdr.Close();
            }
          }
          SavePhotoFile(filename, serverPath, serverTxn);
        }
        ts.Complete();
      }
    }

    private static void SavePhotoFile
      (string clientPath, string serverPath, byte[] serverTxn)
    {
      const int BlockSize = 1024 * 512;

      using (FileStream source =
        new FileStream(clientPath, FileMode.Open, FileAccess.Read))
      {
        using (SqlFileStream dest =
          new SqlFileStream(serverPath, serverTxn, FileAccess.Write))
        {
          byte[] buffer = new byte[BlockSize];
          int bytesRead;
          while ((bytesRead = source.Read(buffer, 0, buffer.Length)) > 0)
          {
            dest.Write(buffer, 0, bytesRead);
            dest.Flush();
          }
          dest.Close();
        }
        source.Close();
      }
    }

    public static Image SelectPhoto(int photoId, out string desc)
    {
      const string SelectTSql = @"
        SELECT
            Description,
            Photo.PathName(),
            GET_FILESTREAM_TRANSACTION_CONTEXT()
          FROM PhotoAlbum
          WHERE PhotoId = @PhotoId";

      Image photo;
      string serverPath;
      byte[] serverTxn;

      using (TransactionScope ts = new TransactionScope())
      {
        using (SqlConnection conn = new SqlConnection(ConnStr))
        {
          conn.Open();

          using (SqlCommand cmd = new SqlCommand(SelectTSql, conn))
          {
            cmd.Parameters.Add("@PhotoId", SqlDbType.Int).Value = photoId;

            using (SqlDataReader rdr = cmd.ExecuteReader())
            {
              rdr.Read();
              desc = rdr.GetSqlString(0).Value;
              serverPath = rdr.GetSqlString(1).Value;
              serverTxn = rdr.GetSqlBinary(2).Value;
              rdr.Close();
            }
          }
          photo = LoadPhotoImage(serverPath, serverTxn);
        }

        ts.Complete();
      }

      return photo;
    }

    private static Image LoadPhotoImage(string filePath, byte[] txnToken)
    {
      Image photo;

      using (SqlFileStream sfs =
        new SqlFileStream(filePath, txnToken, FileAccess.Read))
      {
        photo = Image.FromStream(sfs);
        sfs.Close();
      }

      return photo;
    }

  }
}

Let’s explain the code in detail. We’ll start at the top with some required namespace inclusions. The two using statements to take notice of are System.Data.SqlTypes and System.Transactions. The System.Data.SqlTypes namespace defines the SqlFileStream class that we’ll be using to stream BLOBs. No special assembly reference is required to use this class, because it is provided by the System.Data.dll assembly that our project is already referencing (Visual Studio set this reference automatically when it created our project). The System.Transactions namespace defines the TransactionScope class that lets us code implicit transactions against the database. This class is provided by the System.Transactions.dll assembly, which is not referenced automatically. You’ll need to add a reference to it now, or the code will not compile. Right-click the project in Solution Explorer and choose Add Reference. In the Add Reference dialog, click the .NET tab, and scroll to find the System.Transactionscomponent. Then double-click it to add the reference.

At the top of the class, we define a connection string as a hard-coded constant named ConnStr. This is just for demonstration purposes; a real-world application would store the connection string elsewhere (such as in a configuration file, possibly encrypted), but we’re keeping our example simple.

Streaming Into SQL Server

The first method defined in the class is InsertPhoto, which accepts a new photo integer ID, string description, and full path to an image file to be saved to the database. Notice that the InsertTSql string constant defined at the top of the method specifies an INSERT statement that includes the PhotoId and Description columns, but not the actual Photo BLOB column itself. Instead, the INSERT statement is followed immediately by a SELECT statement that retrieves two pieces of information we’ll use to stream the BLOB into the Photo column much more efficiently than using ordinary T-SQL—namely, a logical UNC path name to the file and the transactional context token. These are the two values needed to use SqlFileStream, and you’re about to see how exactly. But all we’ve done so far is define a constant holding two T-SQL statements. The constant is followed by two variables declarations serverPath and serverTxn that will receive the two special values when we later execute those T-SQL statements.

The method then creates and enters a new TransactionScope block. This does not actually begin the database transaction (we’ve not even connected to the database yet), but rather declares that all data access within the block (and in any code called from within the block) must participate in a database transaction. Inside the TransactionScope block, the code creates and opens a new SqlConnection. Being the first data access code inside the TransactionScope block, this also implicitly begins the database transaction. Next, it creates a SqlCommand object associated with the open connection and prepares its command text to contain our T-SQL statements (the INSERT followed by the SELECT).

Invoking the ExecuteReader method executes the T-SQL statements and returns a reader from which we can retrieve the values returned by the SELECT statement. The transaction is still pending at this time. Our INSERT statement does not provide a value for RowId and instead allows SQL Server to automatically generate and assign a new uniqueidentifier ROWGUID value by default. I’ve also pointed out that no value is provided for the Photo column—and this is exactly how the default 0x value that we defined earlier for the Photo column comes into play (I said we’d come back to it, and here we are).

Although the row has been added by the INSERT statement, it will rollback (disappear) if a problem occurs before the transaction is committed. Because we didn’t provide a BLOB value for the Photo column in the new row, SQL Server honors the default value 0x that we established for it in the CREATE TABLE statement for PhotoAlbum. This represents a zero-length binary stream, which is completely different than NULL. Being a varbinary(max) column decorated with the FILESTREAM attribute, an empty file gets created in the file system that SQL Server associates with the new row. At the same time, SQL Server initiates an NTFS file system transaction over this new empty file and synchronizes it with the database transaction. So just like the new row, the new file will disappear if the database transaction does not commit successfully.

Immediately following the INSERT statement, the SELECT statement returns Photo.PathName and GET_FILESTREAM_TRANSACTION_CONTEXT. What we’re essentially doing with the WHERE clause in this SELECT statement is reading back the same row we have just added (but not yet committed) to the PhotoAlbum table in order to reference the BLOB stored in the new file that was just created (also not yet committed) in the file system.

The value returned by Photo.PathName is a fabricated path to the BLOB for the selected PhotoId. The path is expressed in UNC format, and points to the network share name established for the server instance when you first enabled FILESTREAM (by default, this is MSSQLSERVER). It is not a path the file’s physical location on the server, but rather contains information SQL Server can use to derive the file’s physical location. For example, you’ll notice that it always contains the GUID value in the uniqueidentifier ROWGUIDCOL column of the BLOB’s corresponding row. We retrieve the path value from the reader’s first column and store it in the serverPath string variable.

We just explained how SQL Server initiated an NTFS file system transaction over the FILESTREAM data in the new row’s Photo column when we started our database transaction. The GET_FILESTREAM_TRANSACTION_CONTEXT function returns a handle to that NTFS transaction (if you’re not inside a transaction, this function will return NULL and your code won’t work). We obtain the transaction context, which is returned by the reader’s second column as a SqlBinary value, and store it in the byte array named serverTxn.

Armed with the BLOB path reference in serverPath and the transaction context in serverTxn, we have what we need to create a SqlFileStream object and perform direct file access to stream our image into the Photo column. We close the reader, terminate its using block, then terminate the enclosing using block for the SqlConnection as well. This would normally close the database connection implicitly, but that gets deferred in this case because the code is still nested inside the outer using block for the TransactionScope object. So the connection is still open at this time, and the transaction is still pending. It is precisely at this point that we call the SavePhotoFile method to stream the specified image file into the Photo column of the newly inserted PhotoAlbum row, overwriting the empty file just created by default. When control returns from SavePhotoFile, the TransactionScope object’s Complete method is invoked and its using block is terminated, signaling the transaction management API that everything worked as expected. This implicitly commits the database transaction (which in turn commits the NTFS file system transaction) and closes the database connection.

The SavePhotoFile method reads from the source file and writes to the database FILESTREAM storage in 512 KB chunks at a time using ordinary .NET streaming techniques. The method begins by defining a BlockSize integer constant that is set to a reasonable value of 512 KB. Picture files larger than this will be streamed to the server in 512 KB blocks at a time. The local source image file (in clientPath) is then opened on an ordinary read-only FileStream object.

Then the destination file is opened by passing the two special values (serverPath and serverTxn), along with a FileAccess.Write enumeration requesting write access, into the SqlFileStream constructor. Like the source FileStream object, SqlFileStream inherits from System.IO.Stream, so it can be treated just like any ordinary stream. Thus, you attain write access to the destination BLOB on the database server’s NTFS file system. Remember that this output file is enlisted in an NTFS transaction and nothing you stream to it will be permanently saved until the database transaction is committed by the terminating TransactionScope block, after SavePhotoFile completes. The rest of the SavePhotoFile method implements a simple loop that reads from the source FileStream and writes to the destination SqlFileStream, one 512 KB block at a time until the entire source file is processed, and then it closes both streams.

Streaming Out From SQL Server

The rest of the code contains methods to retrieve existing photos and stream their content from the file system into an Image object for display. You’ll find that this code follows the same pattern as the last, only now we’re performing read access.

The SelectPhoto method accepts a photo ID and returns the string description from the database in an output parameter. The actual BLOB itself is returned as the method’s return value in a System.Drawing.Image object. We populate the Image object with the BLOB by streaming into it from the database server’s NTFS file system using SqlFileStream. Once again, we start things off by entering a TransactionScope block and opening a connection. We then execute a simple SELECT statement that queries the PhotoAlbum table for the record specified by the photo ID and returns the description and full path to the image BLOB, as well as the FILESTREAM transactional context token. And once again we use the path name and transactional context with SqlFileStream to tie into the server’s file system in the LoadPhotoImage method.

Just as when we were inserting new photos (only this time using FileAccess.Read instead of FileAccess.ReadWrite), we create a new SqlFileStream object from the logical path name and transaction context. We then pull the BLOB content directly from the NTFS file system on the server into a new System.Drawing.Image object using the static Image.FromStream method against the SqlFileStream object. The populated image can then be passed back up to a Windows Forms application, where it can be displayed using the Image property of a PictureBox control.

Or, to stream a photo over HTTP from a simple ASP.NET service:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.IO;
using System.Transactions;
using System.Web.UI;

namespace PhotoLibraryHttpService
{
  public partial class PhotoService : Page
  {
    private const string ConnStr =
      "Data Source=.;Integrated Security=True;Initial Catalog=PhotoLibrary";

    protected void Page_Load(object sender, EventArgs e)
    {
      int photoId = Convert.ToInt32(Request.QueryString["photoId"]);
      if (photoId == 0)
      {
        return;
      }

      const string SelectTSql = @"
        SELECT Photo.PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT()
         FROM PhotoAlbum
         WHERE PhotoId = @PhotoId";

      using (TransactionScope ts = new TransactionScope())
      {
        using (SqlConnection conn = new SqlConnection(ConnStr))
        {
          conn.Open();

          string serverPath;
          byte[] serverTxn;

          using (SqlCommand cmd = new SqlCommand(SelectTSql, conn))
          {
            cmd.Parameters.Add("@PhotoId", SqlDbType.Int).Value = photoId;

            using (SqlDataReader rdr = cmd.ExecuteReader())
            {
              rdr.Read();
              serverPath = rdr.GetSqlString(0).Value;
              serverTxn = rdr.GetSqlBinary(1).Value;
              rdr.Close();
            }
          }

          this.StreamPhotoImage(serverPath, serverTxn);
        }
        ts.Complete();
      }
    }

    private void StreamPhotoImage(string serverPath, byte[] serverTxn)
    {
      const int BlockSize = 1024 * 512;
      const string JpegContentType = "image/jpeg";

      using (SqlFileStream sfs =
        new SqlFileStream(serverPath, serverTxn, FileAccess.Read))
      {
        byte[] buffer = new byte[BlockSize];
        int bytesRead;
        Response.BufferOutput = false;
        Response.ContentType = JpegContentType;
        while ((bytesRead = sfs.Read(buffer, 0, buffer.Length)) > 0)
        {
          Response.OutputStream.Write(buffer, 0, bytesRead);
          Response.Flush();
        }
        sfs.Close();
      }
    }
  }
}

Conclusion

The OpenSqlFilestream function provides native file streaming capabilities between FILESTREAM storage in the file system managed by SQL Server and any native-code (e.g., C++) application. SqlFileStream provide a managed code wrapper around OpenSqlFilestream that simplifies direct FILESTREAM access from .NET applications (e.g., C# and VB .NET). This post explained how this API works in detail, and showed the complete data access code that uses SqlFileStream for both reading and writing BLOBs to and from SQL Server. That’s everything you need to know to get the most out of FILESTREAM. I hope you enjoyed it!

SQL Server 2008 FILESTREAM Part 3 of 3: Using the OpenSqlFilestream API

This is the final installment in a 3-post series covering the new FILESTREAM feature in SQL Server 2008. In part 1, I explained how FILESTREAM works at a high level, and part 2 showed you the step-by-step procedure for enabling and using this awesome new feature. In this post, I’ll show you how to use the OpenSqlFilestream function exposed by the SQL Server Native Client API to achieve the maximum FILESTREAM performance possible in your .NET applications.

IMPORTANT UPDATE: I’ve just blogged on SqlFileStream, a managed code wrapper around OpenSqlFilestream that offers a simpler implementation of the mechanism described in this post. Read it here: https://lennilobel.wordpress.com/2011/08/22/using-sqlfilestream-in-c-to-access-sql-server-filestream-data/

What Is OpenSqlFilestream?

OpenSqlFilestream is a function provided by the SQL Server Native Client API, sqlncli10.dll, and gets installed on your development machine when you install the SQL Server 2008 client tools. This function can be called at the point in time that you want to store and retrieve BLOBs from varbinary(max) FILESTREAM columns, where SQL Server will “step aside,” and allow you to call OpenSqlFilestream to obtain a file handle. With the file handle returned by OpenSqlFilestream, you can stream directly against the file system—a native environment optimized for streaming. Contrast such direct file system access with storing and retrieving BLOBs against varbinary(max) FILESTREAM columns the “old-fashioned” way (either by embedding/extracting byte arrays, or in-lining binary streams as hexadecimal values in T-SQL as I demonstrated in part 2), which carries the additional overhead of the FILESTREAM abstraction layer. Using OpenSqlFilestream instead will give you lightning-fast BLOB performance. Let’s dive in!

Creating the Database

Before getting started, be sure that FILESTREAM is enabled for remote file system access at both the Windows Service and SQL Server instance levels (as explained in part 2). Then create a FILESTREAM-enabled database as follows (be sure to create the directory, C:\DB in this example, before creating the database):

CREATE DATABASE PhotoLibrary
 ON PRIMARY
  (NAME = PhotoLibrary_data,
   FILENAME = 'C:\DB\PhotoLibrary_data.mdf'),
 FILEGROUP FileStreamGroup CONTAINS FILESTREAM
  (NAME = PhotoLibrary_blobs,
   FILENAME = 'C:\DB\Photos')
 LOG ON
  (NAME = PhotoLibrary_log,
   FILENAME = 'C:\DB\PhotoLibrary_log.ldf')

Next, use the database and create a table for BLOB storage as follows:

USE PhotoLibrary
GO

CREATE TABLE PhotoAlbum(
 PhotoId int PRIMARY KEY,
 RowId uniqueidentifier ROWGUIDCOL NOT NULL UNIQUE DEFAULT NEWSEQUENTIALID(),
 Description varchar(max),
 Photo varbinary(max) FILESTREAM DEFAULT(0x))

In this table, the Photo column is declared as varbinary(max) FILESTREAM, and will hold pictures that will be stored in the file system behind the scenes. This is virtually the same CREATE TABLE statement as shown in my last post (refer to part 2 for a complete explanation of the varbinary(max) FILESTREAM and ROWGUIDCOL columns). The only thing different here is the default value for the BLOB column. The value 0x represents a zero-length binary stream, which is different than NULL. Think of it as the difference between a zero-length string and a null string in .NET; the two are not the same. Similarly, you won’t be able to use OpenSqlFilestream against NULL instances of varbinary(max) FILESTREAM columns, and you’ll soon see why.

Storing BLOBs Using OpenSqlFilestream

I’ll walk you through the complete steps for building a C# data access class in Visual Studio that demonstrates how to use OpenSqlFilestream. In order to keep things simple for demonstration purposes, we won’t be following best practices; namely, we’ll be executing direct T-SQL statements instead of using stored procedures (as you should be using in production code).

Start Visual Studio and create a new Class Library project named PhotoLibraryDAL with a single class file named PhotoData.cs, and add the following public static method to the class named InsertPhoto:

public static void InsertPhoto(int photoId, string desc, string filename)
{
  const string InsertCmd =
   "INSERT INTO PhotoAlbum(PhotoId, Description)" +
   " VALUES(@PhotoId, @Description)";

  using(SqlConnection conn = new SqlConnection(ConnStr))
  {
    conn.Open();

    using(SqlTransaction txn = conn.BeginTransaction())
    {
      using(SqlCommand cmd = new SqlCommand(InsertCmd, conn, txn))
      {
        cmd.Parameters.Add(&quot;@PhotoId&quot;, SqlDbType.Int).Value = photoId;
        cmd.Parameters.Add(&quot;@Description&quot;, SqlDbType.VarChar).Value = desc;
        cmd.ExecuteNonQuery();
      }

      SavePhotoFile(photoId, filename, txn);
      txn.Commit();
    }

    conn.Close();
  }
}

Client applications call the InsertPhoto method to insert a new photo into the PhotoAlbum table by passing in a photo ID, description, and filename pointing to a local file containing the actual photo image. It then opens a connection, begins a transaction, and executes an INSERT statement. Notice that the INSERT statement supplies values only for the PhotoId and Description columns. What about the RowId and Photo columns? Because we’ve omitted them, the defaults we established in the table definition are applied. The RowId column gets assigned the next available GUID by the GETSEQUENTIALID function. And for the PhotoId BLOB column, the default 0x (zero-length binary stream) applies.

The result of supplying a zero-length binary stream value for the Photo column is that SQL Server creates an empty file in the file system that is linked to the Photo column of the row just inserted. However, because we’ve begun a transaction, SQL Server has automatically initiated an NTFS file system transaction over the empty file added to the file system. So now we have a new row inserted in the table—but not committed, and we have a new empty file created in the file system—but not committed. This only works because we specified (albeit by default) a zero-length binary stream (0x) rather than NULL when inserting the varbinary(max) FILESTREAM value. If an exception occurs, or if the database transaction rolls back, or if any other condition occurs in which the database transaction doesn’t commit successfully, the NTFS file system transaction will get rolled back automatically as well. In that case, both the inserted row in the table and the empty file in the file system go away.

Now is that “point in time” that we’d like SQL Server to “step aside” so we can call OpenSqlFilestream to store the BLOB for the photo image. To do that, we call the SavePhotoFile method, which is coded as follows:

private static void SavePhotoFile(int photoId, string filename, SqlTransaction txn)
{
  const int BlockSize = 1024 * 512;
  FileStream source = new FileStream(filename, FileMode.Open, FileAccess.Read);
  SafeFileHandle handle = GetOutputFileHandle(photoId, txn);
  using(FileStream dest = new FileStream(handle, FileAccess.Write))
  {
    byte[] buffer = new byte[BlockSize];
    int bytesRead;
    while((bytesRead = source.Read(buffer, 0, buffer.Length)) > 0)
    {
      dest.Write(buffer, 0, bytesRead);
    }
    dest.Close();
  }
  source.Close();
}

There’s actually nothing magical about this method. You can see that it simply streams, in 512K chunks at a time, from a source stream to a destination stream. This is just the way you’d implement a simple, conventional, file-copy routine. Getting a handle on the source stream is also a no-brainer; since the source file is local, you simply invoke the System.IO.FileStream constructor that accepts a filename and enumerations specifying that the file should be opened for read access. So the question then becomes, how do we get a handle on the destination stream, which is a direct channel to the empty and not-yet-committed file on the server’s NTFS file system that SQL Server associates with the Photo column? The answer lies in the GetOutputFileHandle method which we call to obtain a SafeFileHandle object to the destination file:

private static SafeFileHandle GetOutputFileHandle(int photoId, SqlTransaction txn)
{
  const string GetOutputFileInfoCmd =
    "SELECT GET_FILESTREAM_TRANSACTION_CONTEXT(), Photo.PathName()" +
    " FROM PhotoAlbum" +
    " WHERE PhotoId = @PhotoId";

  SqlCommand cmd = new SqlCommand(GetOutputFileInfoCmd, txn.Connection, txn);
  cmd.Parameters.Add("@PhotoId", SqlDbType.Int).Value = photoId;

  string filePath;
  byte[] txnToken;

  using(SqlDataReader rdr = cmd.ExecuteReader(CommandBehavior.SingleRow))
  {
    rdr.Read();
    txnToken = rdr.GetSqlBinary(0).Value;
    filePath = rdr.GetSqlString(1).Value;
    rdr.Close();
  }

  SafeFileHandle handle =
    NativeSqlClient.GetSqlFilestreamHandle
    (filePath, NativeSqlClient.DesiredAccess.ReadWrite, txnToken);

  return handle;
}

This code is at the heart of the matter. In order to call OpenSqlFilestream, we need to obtain two key pieces of information. First, we need a logical path name to the destination file. We obtain that by selecting back the not-yet-committed row we just inserted, and invoking the PathName method on the Photo column. Nobody else can access this row because it hasn’t been committed yet (that would be a dirty read); it will never actually come into existence if the transaction rolls back. But we can read it because we’re inside the transaction, and so we can easily get the path name. No, the path name returned by PathName is not a UNC path to the physical file system on the server, so just lay that security concern to rest. It’s just a bogus path that has meaning only in the context of this transaction which SQL Server can use to map to the real file in the file system. Secondly, we need a token that identifies the NTFS file system transaction that SQL Server initiated behind the scenes, which we obtain with the GET_FILESTREAM_TRANSACTION_CONTEXT function.

Armed with these two key pieces of information, we can call OpenSqlFilestream. Because OpenSqlFilestream is a native code function, I’ve place it in a separate GetSqlFilestreamHandle method in a separate NativeSqlClient class to keep the DllImport and other native code details isolated from our .NET data access class (yes, I’m a neat freak when it comes to code, and you should be too):

using System;
using System.Runtime.InteropServices;

using Microsoft.Win32.SafeHandles;

namespace PhotoLibraryFilestreamDemo
{
  public class NativeSqlClient
  {
    public enum DesiredAccess : uint
    {
      Read,
      Write,
      ReadWrite,
    }

    [DllImport("sqlncli10.dll", SetLastError = true, CharSet = CharSet.Unicode)]
    private static extern SafeFileHandle OpenSqlFilestream(
      string path,
      uint access,
      uint options,
      byte[] txnToken,
      uint txnTokenLength,
      Sql64 allocationSize);

    [StructLayout(LayoutKind.Sequential)]
    private struct Sql64
    {
      public Int64 QuadPart;
      public Sql64(Int64 quadPart)
      {
        this.QuadPart = quadPart;
      }
    }

    public static SafeFileHandle GetSqlFilestreamHandle
     (string filePath, DesiredAccess access, byte[] txnToken)
    {
      SafeFileHandle handle = OpenSqlFilestream(
        filePath,
        (uint)access,
        0,
        txnToken,
        (uint)txnToken.Length,
        new Sql64(0));

      return handle;
    }
  }
}

As you can see, the GetSqlFilestreamHandle method accepts the transaction context token and the path obtained by the GET_FILESTREAM_TRANSACTION_CONTEXT function and PathName method respectively. It also accepts an enumeration that specifies the desired access mode, which can be Read, Write, or ReadWrite. The OpenSqlFilestream function requires other parameters that are not generally applicable for standard FILESTREAM usage, such as the unsigned 32-bit options and 64-bit allocation size arguments. These simply get passed in as 0. The SafeFileHandle returned by OpenSqlFilestream is defined by the .NET framework in the core library assembly mscorlib.dll, so no special reference needs to be set to access this class. Control then gets passed back up the call stack, to the SavePhotoFile method, which calls an overloaded version of the System.IO.Filestream constructor that accepts a SafeFileHandle object. After the source stream is copied entirely to the destination stream (overwriting the empty file on the server), control returns to the InsertPhoto method and the database transaction is finally committed. At that point, both the inserted row and the file in the file system are permanently saved, and the connection is closed. And that’s the way to stream BLOBs into varbinary(max) FILESTREAM columns in SQL Server 2008!

Retrieving BLOBs Using OpenSqlFilestream

Reading FILESTREAM data back out from the database to your application follows a very similar pattern. You open a connection and start a transaction. Note that this is normally not be considered best practice, as you should always try to have your read operations execute outside the context of a transaction. But to implement OpenSqlFilestream for read operations, this is exactly what you do. Then you use OpenSqlFilestream in the very same was we did for writing the BLOB to the database. The exact implementation depends on where you want to stream the content to.

For example, to stream a photo into an Image object for display in a Windows Forms PictureBox control, you can implement a SelectPhoto method like this:

public static Image SelectPhoto(int photoId, out string desc)
{
  const string SelectCmd =
    "SELECT Description, Photo.PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT()" +
    " FROM PhotoAlbum" +
    " WHERE PhotoId = @PhotoId";

  Image photo;

  using(SqlConnection conn = new SqlConnection(ConnStr))
  {
    conn.Open();

    using(SqlTransaction txn = conn.BeginTransaction())
    {
      string filePath;
      byte[] txnToken;

      using(SqlCommand cmd = new SqlCommand(SelectCmd, conn, txn))
      {
        cmd.Parameters.Add("@PhotoId", SqlDbType.Int).Value = photoId;

        using(SqlDataReader rdr = cmd.ExecuteReader(CommandBehavior.SingleRow))
        {
          rdr.Read();
          desc = rdr.GetSqlString(0).Value;
          filePath = rdr.GetSqlString(1).Value;
          txnToken = rdr.GetSqlBinary(2).Value;
          rdr.Close();
        }
      }

      photo = LoadPhotoImage(filePath, txnToken);

      txn.Commit();
    }

    conn.Close();
  }

  return photo;
}

private static Image LoadPhotoImage(string filePath, byte[] txnToken)
{
  Image photo;

  SafeFileHandle handle =
    NativeSqlClient.GetSqlFilestreamHandle
     (filePath, NativeSqlClient.DesiredAccess.Read, txnToken);

  using(FileStream fs = new FileStream(handle, FileAccess.Read))
  {
    photo = Image.FromStream(fs);
    fs.Close();
  }

  return photo;
}

Because the Image class has a static FromStream method that consumes the stream passed as a parameter to the method, this code essentially firehouses the stream at the fastest possible speed directly from the file system on SQL Server into the Image object.

Or, to stream a photo over HTTP from an ASP.NET service:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.IO;
using Microsoft.Win32.SafeHandles;

namespace PhotoLibraryHttpService
{
 public partial class PhotoService : System.Web.UI.Page
 {
  private const string ConnStr =
    "Data Source=.;Integrated Security=True;Initial Catalog=PhotoLibrary;";

  protected void Page_Load(object sender, EventArgs e)
  {
   int photoId = Convert.ToInt32(Request.QueryString["photoId"]);
   if (photoId == 0)
   {
    return;
   }

   const string SelectCmd =
    "SELECT Photo.PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT()" +
    " FROM PhotoAlbum" +
    " WHERE PhotoId = @PhotoId";

   using (SqlConnection conn = new SqlConnection(ConnStr))
   {
    conn.Open();

    using (SqlTransaction txn = conn.BeginTransaction())
    {
     string filePath;
     byte[] txnToken;

     using (SqlCommand cmd = new SqlCommand(SelectCmd, conn, txn))
     {
      cmd.Parameters.Add("@PhotoId", SqlDbType.Int).Value = photoId;

      using (SqlDataReader rdr = cmd.ExecuteReader(CommandBehavior.SingleRow))
      {
       rdr.Read();
       filePath = rdr.GetSqlString(0).Value;
       txnToken = rdr.GetSqlBinary(1).Value;
       rdr.Close();
      }
     }

     this.StreamPhotoImage(filePath, txnToken);

     txn.Commit();
    }

    conn.Close();
   }
  }

  private void StreamPhotoImage(string filePath, byte[] txnToken)
  {
   const int BlockSize = 1024 * 512;
   const string JpegContentType = "image/jpeg";

   SafeFileHandle handle =
     NativeSqlClient.GetSqlFilestreamHandle
      (filePath, NativeSqlClient.DesiredAccess.Read, txnToken);

   using (FileStream source = new FileStream(handle, FileAccess.Read))
   {
    byte[] buffer = new byte[BlockSize];
    int bytesRead;
    Response.BufferOutput = false;
    Response.ContentType = JpegContentType;
    while ((bytesRead = source.Read(buffer, 0, buffer.Length)) > 0)
    {
     Response.OutputStream.Write(buffer, 0, bytesRead);
     Response.Flush();
    }
    source.Close();
   }
  }
  }
}

Have Fun with FILESTREAM!

This concludes my 3-part series on using FILESTREAM in SQL Server 2008, which explains everything you need to know to get the most out of this truly awesome new feature. I hope you enjoyed it, and look forward to hearing your FILESTREAM success stories!

Data/View Separation in Word 2007 using Word XML Data Binding and OpenXML

The OpenXML file format used by Microsoft Office 2007 enables programmatic document manipulation without automating Word, so that Word does not need to be installed alongside your code, which is great for server applications. In this post, I’ll show you how to use OpenXML to implement bi-directional data binding between a Word document and XML data. This new feature in Word 2007 delivers the best data/view separation we’ve seen to date in Microsoft Office. And (for a LINQ junkie such as myself, anyway) the best part is that it works so nicely with LINQ to XML.

We’ll begin by creating a document with content controls. Then we’ll write a small amount of code to embed an XML file into the document and bind the content controls to various elements in the XML.

Before starting, you need to download the OpenXML SDK 2.0. Once the SDK is installed, you’ll have the assemblies you need to programmatically manipulate Office documents using the OpenXML API. You can download the SDK from here: http://www.microsoft.com/downloads/details.aspx?FamilyId=C6E744E5-36E9-45F5-8D8C-331DF206E0D0&displaylang=en

Create the Document

We’ll create a simple document with three content controls that will be bound to the following XML data:

<Customer>
  <Name>John Doe</Name>
  <Expiration>2/1/2010</Expiration>
  <AmountDue>$129.50</AmountDue>
</Customer>

Start a new document in Word 2007, and type the word Dear followed by a space. We now want to inject our first content control for the customer name. You can add content controls to your document from the Developer tab on the ribbon, but since that tab is hidden by default, you’ll need to show it first. Click the Office button and choose Word Options. Then check the “Show Developer tab in the Ribbon” checkbox and click OK.

With the cursor still positioned after the text “Dear “, click in the Developer tab to insert a new Text content control:

We’ll be assigning tag names to identify content controls for data binding in our code. To view the tag names in Word as we setup the content controls, click the Design Mode button in the Developer tab:

With the content control still selected, click Properties:

 

Set the Tag of the content control to Customer/Name, which is an XPath expression that refers to the <Name> element nested inside the <Customer> element of our XML data.

Click OK to close the dialog box. Notice how Word displays the tag name in the content control in Design Mode:

 

Continue adding text to the document, and insert two more content controls in a similar fashion for the expiration date and amount due fields. Use the same Text content control for the amount due field like we used for the customer name field, but use a Date Picker content control for the expiration date field:

After setting the tag names of the additional content controls to the appropriate XPath expressions (Customer/Expiration and Customer/AmountDue), your document should look something like this:

Turn off Design Mode to hide the tag names from the content controls:

Save the document as CustomerReminder.docx. We have already established a separation of variable data and fixed document text, and could easily setup document protection at this point to allow manual data entry only for content controls. That’s cool all by itself, but we’re going to take this to the next level and implement automatic data binding between these content controls and a separate XML file. Although Word doesn’t care what values you assign to content control tags, we will now write code to find all tagged content controls, and treat those tags as XPath expressions that point to various XML elements in our XML data.

If you’re not already aware, all Office 2007 documents (including Word docx files) are actually compressed zip files (“packages”) that hold a collection of related XML “parts.” You can see this easily by simply appending .zip to the document filename and then opening it from Windows Explorer:

After examining the contents of the document package, rename the filename extension back to “.docx.”

Write the Code

Start Visual Studio and create a new Windows Forms project. Now set references to the two assemblies you’ll need for OpenXML programming: DocumentFormat.OpenXml and WindowsBase (these can be found on the .NET tab of the Add Reference dialog box). The first assembly exposes the OpenXML object model for document manipulation, and the second assembly provides the support for embedding and extracting parts to and from the document package:

Drop a button onto the form named btnInit and supply code for its Click event handler. You’ll also need a few additional using statements for importing some namespaces. The complete code listing below shows the required using statements and the btnInit_Click event handler.

using System;
using System.IO;
using System.Linq;
using System.Windows.Forms;
using System.Xml.Linq;

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

namespace OpenXmlWordDataBinding
{
  public partial class Form1 : Form
  {
    public Form1()
    {
      InitializeComponent();
    }

    private void btnInit_Click(object sender, EventArgs e)
    {
      var xml =
        new XElement("Customer",
          new XElement("Name", "John Doe"),
          new XElement("Expiration", "2/1/2010"),
          new XElement("AmountDue", "$129.50"));

      var docxFile = @"..\..\CustomerReminder.docx";

      using (var wpd = WordprocessingDocument.Open(docxFile, true))
      {
        var mainPart = wpd.MainDocumentPart;
        var xmlPart = mainPart.AddNewPart<CustomXmlPart>();
        using (Stream partStream = xmlPart.GetStream(FileMode.Create, FileAccess.Write))
        {
          using (StreamWriter outputStream = new StreamWriter(partStream))
          {
            outputStream.Write(xml);
          }
        }

        var taggedContentControls =
          from sdt in mainPart.Document.Descendants<SdtRun>()
          let sdtPr = sdt.GetFirstChild<SdtProperties>()
          let tag = (sdtPr == null ? null : sdtPr.GetFirstChild<Tag>())
          where tag != null
          select new
          {
            SdtProps = sdtPr,
            TagName = tag.GetAttribute("val", "http://schemas.openxmlformats.org/wordprocessingml/2006/main").Value
          };

        foreach (var taggedContentControl in taggedContentControls)
        {
          var binding = new DataBinding();
          binding.XPath = taggedContentControl.TagName;
          taggedContentControl.SdtProps.Append(binding);
        }

        mainPart.Document.Save();
      }
    }
  }
}

Let’s analyze this code piece by piece. First, we construct the XML data as a graph of XML objects using LINQ to XML functional construction:

var xml =
  new XElement("Customer",
    new XElement("Name", "John Doe"),
    new XElement("Expiration", "2/1/2010"),
    new XElement("AmountDue", "$129.50"));

Next, we invoke WordprocessingDocument.Open to open the Word document for write access using the OpenXML SDK. We do this inside of a using block to ensure proper disposal of unmanaged resources in the event that unhandled exception occurs inside of the block. The boolean true value specified for the second parameter means that we’re opening the document for write access:

var docxFile = @"..\..\CustomerReminder.docx";
using (var wpd = WordprocessingDocument.Open(docxFile, true))

Our code needs to do two things to the document at this point: add a new XML part containing the data to be bound, and assign the appropriate bindings to the content controls. Realize that this is code that will only execute once; subsequently, we’ll only be extracting/embedding the XML part from/to the document package using normal zip compression methods.

Add a New XML Part

With the document open and accessible to our code via the wpd variable, we invoke the AddNewPart<CustomXmlPart> method on wpd.MainDocumentPart to retrieve a new XML part in xmlPart. The new XML part exposes a GetStream method that enables us to stream content into and out of it. We take the generic Stream returned by GetStream and wrap a StreamWriter around it. The XML content can then be sent into the new XML part by writing to the StreamWriter. Again, we use nested using blocks on the stream objects to ensure proper cleanup if an exception occurs:

var mainPart = wpd.MainDocumentPart;
var xmlPart = mainPart.AddNewPart<CustomXmlPart>();
using (Stream partStream = xmlPart.GetStream(FileMode.Create, FileAccess.Write))
{
  using (StreamWriter outputStream = new StreamWriter(partStream))
  {
    outputStream.Write(xml);
  }
}

Now that the XML data is embedded as an XML part inside of the document’s .docx package, it’s available for data binding to content controls. The next and last thing we need to do is to programmatically find the tagged content controls inside the document and assign XML data bindings using the XPath expressions we defined in the content control tags.

Add the Data Bindings

Learning how to program Word with OpenXML is all about exploring the special XML markup language used by Word (a dialect known as WordprocessingML) and discovering the element and attribute names used to represent the document. You can view the document’s markup by examining the word/document.xml part inside of the .docx package. It can take quite a bit of detective work as you undergo the exploration and discovery process, but once you’re armed with the information you need, it’s very simple and straightforward to write a LINQ to XML query to get at what you need to in the document.

For example, here is how the customer name content control is expressed in WordprocessingML:

<w:sdt>
     <w:sdtPr>
           <w:tag w:val="Customer/Name"/>
           <w:id w:val="852281"/>
           <w:placeholder>
                <w:docPart w:val="DefaultPlaceholder_22675703"/>
           </w:placeholder>
           <w:showingPlcHdr/>
           <w:text/>
     </w:sdtPr>
     <w:sdtContent>
           <w:r w:rsidRPr="004E1B99">
                <w:rPr>
                      <w:rStyle w:val="PlaceholderText"/>
                </w:rPr>
                <w:t>Click here to enter text.</w:t>
           </w:r>
     </w:sdtContent>
</w:sdt>

An analysis of this XML shows how Word stores content controls inside a document. Specifically, Word creates a <w:sdt> element (structured document tag) and nests a <w:sdtPr> element (structured document tag properties) inside of it. And nested beneath that, Word stores the content control’s tag name in the w:val attribute of a <w:Tag> element. These element names can be referred to programmatically using the OpenXML API type names SdtRun, SdtProperties, and Tag, respectively. Thus, the following LINQ to XML query retrieves all tagged content controls in our document:

var taggedContentControls =
  from sdt in mainPart.Document.Descendants<SdtRun>()
  let sdtPr = sdt.GetFirstChild<SdtProperties>()
  let tag = (sdtPr == null ? null : sdtPr.GetFirstChild<Tag>())
  where tag != null
  select new
  {
     SdtProps = sdtPr,
     TagName = tag.GetAttribute("val", "http://schemas.openxmlformats.org/wordprocessingml/2006/main").Value
  };

In this query, the from clause uses the Descendants<SdtRun> method to retrieve all the <w:sdt> elements in the document. For each one, the first let clause uses the GetFirstChild<SdtProperties> method to retrieve the nested <w:sdtPr> element beneath it. The second let clause then performs a similar operation to retrieve the <w:tag> element nested beneath that. Since you cannot assume that every <w:sdt> element will always have a nested <w:sdtPr> element, the second let clause tests sdtPr for null before trying to invoke GetFirstChild<Tag> on it. The where clause ultimately filters our all content controls that have not been tagged.

The select clause constructs an anonymous type that returns two properties for each tagged content control. The first is the <w:sdtPr> which will be needed to reference the location at which we’ll inject the XML binding element. The second is the tag name extracted from the <w:tag> element’s <w:val> attribute (the full OpenXML namespace needs to be specified for w to retrieve the attribute value), which supplies the XPath expression for the binding. The LINQ query returns the results in a sequence that we then iterate to set the bindings:

foreach (var taggedContentControl in taggedContentControls)
{
  var binding = new DataBinding();
  binding.XPath = taggedContentControl.TagName;
  taggedContentControl.SdtProps.Append(binding);
}

mainPart.Document.Save();

For each tagged content control, we create a DataBinding object, set its XPath property to the tag name which is known provide an XPath expression referencing the desired XML data, and add the binding as a new element inside of the <w:sdtPr> element. Finally, invoking the Save method saves the modified document back to disk. If you examine the WordprocessingML now, you’ll see that our code added a <w:dataBinding> element with a w:xpath attribute to the <w:sdtPr> element in each content control. For example, the customer name content control now has the following element in its <w:sdtPr> section:

<w:dataBinding w:xpath="Customer/Name" />

Now let’s examine the docx package again. Add the .zip extension to the document filename and crack it open. Notice the new customXML folder that wasn’t there before. This is where all custom XML parts added to the document get stored.

Double-click the customXML folder to reveal the item.xml file contained inside of it:

Now double-click item.xml to view it in IE:

Look familiar? That’s the XML our code injected into the document as a custom XML part!

Now close the zip, rename the file back as .docx, and open it in Word:

All the content placeholders have now been filled in with data from our XML file Does it get better than this? You bet! This thing works bi-directional. So go ahead and change John Doe to Sue Taylor:

Now save the document, and rename it as .zip once again. This time, extract the item.xml file instead of opening it in IE, so that you can edit it in Notepad:

Beautiful. Word pushed the data from the modified content controls in the document back out to the XML file. Let’s now edit the XML; for example, we’ll change the expiration date:

Save the XML file, push it back into the zip package, rename the zip package back to .docx, and re-open it in Word one last time:

Now that’s what I call data/view separation! With XML data binding in Word, your application can easily view and change data elements without ever digging into the document text.

Of course, this only scratches the surface of what’s possible with OpenXML. It’s also worth mentioning that the same thing can be achieved using VSTO (Visual Studio Tools for Office) and the Word object model, but we’ll leave that for a future post. Until then, happy coding!

Streaming Into LINQ to XML Using C# Custom Iterators and XmlReader

In this post, I’ll show you how to create LINQ to XML queries that don’t require you to first load and cache XML content into the in-memory LINQ to XML DOM (that is, without first populating an XDocument or XElement query source), but instead operate against an input stream implemented with a C# custom iterator method and an old-fashioned XmlReader object.

LINQ to XML

LINQ to XML, introduced with the .NET Framework 3.5, is a huge win for developers working with XML in any shape or form. Whether XML is being queried, parsed, or transformed, LINQ to XML can almost always be used as an easier alternative to previous technologies that are based on XML-specific languages (XPath, XQuery, XSLT).

At the center of the LINQ to XML stage lies a new DOM for caching XML data in memory. This object model, based on either a root XDocument object or independent XElement objects, represents a major improvement over the older XmlDocument-based DOM in numerous ways (details of which will serve as the topic for a future post). In terms of querying, the XDocument and XElement objects provide methods (such as Descendants) that expose collections of nodes which can be iterated by a LINQ to XML query.

Consuming Sequences with LINQ

A common misconception by many developers learning LINQ is that LINQ only consumes collections. That is not surprising, since one of first benefits developers come to understand is that LINQ can be used to easily query an input collection without coding a foreach loop. While this is certainly true, it’s a very limited view of what can really be accomplished with LINQ.

The broader view is that LINQ works against sequences, which are often — but certainly not always — collections. A sequence can be generated by any object that implements IEnumerable<T> or any method that returns an IEnumerable<T>. Such objects and methods provide iterators that LINQ calls to retrieve one element after another from the source sequence being queried. For a collection, the enumerator simply walks the elements of the collection and returns them one at a time. But you can create your own class that implements IEnumerable<T> or your own custom iterator method which serves as the enumerator that returns a sequence based on something other than a collection.

The point? LINQ is not limited to querying in-memory collections. It can be used to query any sequence, which is fed to the LINQ query by enumerator methods exposed by all classes that implement IEnumerable<T>. What this means is that you don’t necessarily need to load an entire XML document into an in-memory cache before you can query it using LINQ to XML. If you are querying very large documents in order to extract just a few elements of interest, you can achieve better performance by having your LINQ to XML query stream through the XML—without ever caching the XML in memory.

Querying Cached XML vs. Streamed XML

How do you write LINQ to XML queries that consume a read-only, forward-only input stream instead of a pre-populated in-memory cache? Easy. Refer to a custom iterator method instead of a pre-populated XDocument or XElement in the from clause of your query.

For example, consider the following query:

var xd = XDocument.Load("Customers.xml");
var domQuery =
  from c in xd.Descendants("Customer")
  where (string)c.Attribute("Country") == "UK"
  select c;

Now compare that query with this version:

var streamQuery =
  from c in StreamElements("Customers.xml", "Customer")
  where (string)c.Attribute("Country") == "UK"
  select c;

Both versions query over the same XML and produce the same output result (a sequence of customer nodes from the UK), but they consume their input sequences in completely differently ways. The first version first loads the XML content into an XDocument (in-memory cache), and then queries the collection of nodes returned by the Descendants method (all <Customer> nodes) for those in the UK. The larger the XML document being queried, the more memory is consumed by this approach. The second version queries directly against an input stream that feeds the XML content as a sequence using a custom iterator method named StreamElements. This version will consume no more memory for querying a huge XML file than it will for a tiny one. Here’s the implementation of the custom iterator method:

private static IEnumerable<XElement> StreamElements(
  string fileName,
  string elementName)
{
  using (var rdr = XmlReader.Create(fileName))
  {
    rdr.MoveToContent();
    while (rdr.Read())
    {
      if ((rdr.NodeType == XmlNodeType.Element) && (rdr.Name == elementName))
      {
        var e = XElement.ReadFrom(rdr) as XElement;
        yield return e;
      }
    }
    rdr.Close();
  }
}

Understanding C# Custom Iterators

By definition, this method is a custom iterator since it returns an IEnumerable<T> and has a yield return statement in it. By returning IEnumerable<XElement> specifically (just like the Descendants method in the cached DOM version of the query does), this custom iterator is suitable as the source for a LINQ to XML query. The implementation of this method is a beautiful demonstration of how seamlessly new technology (LINQ to XML) integrates with old technology (the XmlReader object has been around since .NET 1.0). Let’s examine the method piece by piece to understand exactly how it works.

When the query first begins to execute, and LINQ needs to start scanning the input sequence, the StreamElements custom iterator method is called with two parameters. The first parameter is the name of the input XML file (“Customers.xml”) and the second parameter is the element of interest to be queried (“Customer”, for each <Customer> node in the XML file).  The method then opens an “old-fashioned” XmlReader against the XML file and advances the stream to the beginning of its content by invoking MoveToContent. It then enters a loop that reads from the stream one element at a time. Each element is tested against the second parameter (“Customer”, in this example). Each matching element (which would be all <Customer> elements) is converted into in XElement object by invoking the static XElement.ReadFrom method against the reader. The XElement object representing the matching node is then yield returned to the LINQ query.

As the query continues to execute, and LINQ needs to continue scanning the input sequence for additional elements, the StreamElements custom iterator method continues execution right after the point at which it yield returned the previous element, rather than entering at the top of the method like an ordinary method would. In this manner, the input stream advances and returns one matching XElement after another while the LINQ query consumes that sequence and filters for UK to produce a new output sequence with the results. When the end of the input stream is reached, the reader’s Read method will return false, which will end the loop, close the reader, and finally exit the method. When the custom iterator method exits, that signals the LINQ query that there are no more input elements in the sequence and the query completes execution at that point.

One important point that may seem obvious but is worth calling out anyway, is that running streaming queries more than once results in reading through the entire stream each time. So to best apply this technique, you should try to extract everything you need from the XML content in one pass (that is, with one LINQ query). If it turns out that you need to query over the stream multiple times, you’ll need to reconsider matters to determine if you aren’t better off caching the XML content once and then querying over the in-memory cache multiple times.

Custom iterators are a very powerful C# language feature, not necessarily limited for use with LINQ. Streaming input into LINQ to XML queries as an alternative to using a cached input source is just one example, but there are numerous others ways to leverage custom iterators. To learn how they can be used with SQL Server 2008 Table-Valued Parameters to marshal an entire business object collection to a stored procedure in a single round-trip, view my earlier blog post that explains the details: SQL Server 2008 Table-Valued Parameters and C# Custom Iterators: A Match Made In Heaven!

Optimizing Factory Methods with Static Delegate Arrays

In my last post, I explained the benefits of using factory classes to achieve polymorphism in your business applications, and demonstrated how implementing such an architecture greatly improves the design and maintainability of your code. In this post (part 2 if you will) I’ll first quickly review the benefits of the factory pattern, and then demonstrate how to refactor the typical switch/case factory implementation to achieve ultra-high performance using a static array of delegates.

Here is the TenderType enum and TenderFactory code from the last post:

public enum TenderType
{
  Cash,
  DebitCard,
  CreditCard,
   //
   // more tender types
   //
}

public static class TenderFactory
{
  public static ITender CreateTender(TenderType tenderType)
  {
    switch (tenderType)
    {
      case TenderType.Cash:
        return new CashTender();

      case TenderType.DebitCard:
        return new DebitCardTender();

      case TenderType.CreditCard:
        return new CreditCardTender();
       //
       // more case statements here
       //
      default:
        throw new Exception("Unsupported tender: " + (int)tenderType);
    }
  }
}

In this example, we have a factory method that accepts a TenderType enum and uses it to create a concrete tender object corresponding to that enum. All the different tender objects implement the ITender interface, so that’s the type returned by the factory. Because the different tender behaviors are encapsulated in the set of concrete tender classes, client code can simply call this factory method to retrieve an ITender object for any tender type and work with that object without knowing the actual type. That is, you can write polymorphic code that doesn’t need to be maintained as concrete type behaviors change or new concrete types are added in the future.

It’s easy to recognize when you should be using this pattern in your own applications. When you find yourself coding the same switch/case ladder repeatedly, that’s a sure indication that your architecture can be improved significantly by using factories and polymorphism. It’s important to sensitize yourself so that you detect this condition early in your development efforts and apply the factory pattern then. It will be a much greater burden to refactor your code to use the factory pattern later on, once you have allowed a proliferation of switch/case statements to spread throughout your code.

So now we’ve got a single switch/case ladder in a centralized factory method, which eliminates the multitudes of duplicate switch/case ladders that would have otherwise been scattered throughout our application’s code. A huge improvement, to be certain, but can we improve it even more? You bet!

The Need For Speed

Because object instantiation is a very common occurrence, factory methods in general need to perform well. A performance bottleneck in the factory will quickly translate to a serious application-wide problem. If you’re dealing with a small set of concrete types (like our tenders example), and/or your factory code is running on a desktop or other single-user environment, the switch/case implementation of the factory method shown above is probably perfectly suitable and won’t require optimization.

In the real world, however, a factory method frequently supports dozens or even hundreds of different concrete types, which means coding a very long switch/case ladder in that method. Next consider the fact that the switch/case ladder is evaluated sequentially at runtime, top-to-bottom. This means that a matching case statement further down the ladder takes longer to reach than one further up the ladder. Again, for a handful of types in a single-user environment, that differential is imperceptible. But if your factory code supports many types through a long switch/case ladder, and runs on a multi-user application server that is servicing many concurrent client requests for new object instances in high demand, then it becomes critical that your factory code executes as quickly as possible.

A Golden Rule of Optimization: Let The Compiler Do The Work

The following alternative implementation is vastly superior to the switch/case version:

public static class TenderFactory
{
  private delegate ITender CreateTenderMethod();

  // Important: The order of delegates in this static array must match the
  //  order that the TenderType enums are defined.
  private static CreateTenderMethod[] CreateTenderMethods = new CreateTenderMethod[]
  {
    delegate() { return new CashTender(); },
    delegate() { return new DebitCardTender(); },
    delegate() { return new CreditCardTender(); },
     //
     // more delegates here
     //
  };

  public static ITender CreateTender(TenderType tenderType)
  {
    var tender = CreateTenderMethods[(int)tenderType].Invoke();
    return tender;
  }
}

This factory code takes a novel approach. Let’s dissect it.

At the top of the class we define a delegate named CreateTenderMethod which takes no parameters and returns an ITender object. We then declare a static array of CreateTenderMethod delegates, and populate it with anonymous methods that return for each concrete tender type. Stop and think about what this means. The static array is allocated in memory and populated with method pointers for each tender type by the compiler at compile-time. So when this assembly loads into memory, the static array just rolls right into the address space with all the elements already mapped to the methods returning the concrete types. There is no runtime hit whatsoever for dynamically allocating storage space for the array from heap and populating it with delegate instances. The work was already done by the compiler. Having the compiler do work at compile time to avoid having to do the work at runtime is one of the golden rules of optimization.

The CreateTender method itself is now ridiculously simple. It takes the incoming enum and converts it to an integer which it uses as an index into the static array. That instantaneously retrieves the correct delegate which points to the method that will return the concrete tender type specified by the enum. In an array of 250 elements, it won’t take any longer to retrieve the delegate for the 250th element than it will for the first. The Invoke method on the delegate actually runs the method and returns the correct ITender derivative to the requesting client. The only important thing to remember when using this technique, which you may have already realized on your own, is that the order of delegates in the array must match the order that the enums are defined (as mentioned in the code comments). A mismatch will obviously manifest itself as a serious runtime error.

What’s really nice here is that anonymous methods added in C# 2.0 greatly reduce the amount of code this solution requires, compared to what was required before anonymous methods. Back then, you’d need to explicitly create one-line methods for each concrete tender type, and then reference each of those one-line methods from an explicit delegate constructor in the static array. So this implementation is now not only significantly faster than the typical switch/case approach, it’s also just as easy to implement and maintain. Don’t you just love it when there are no downsides?