Using T4 Templates to Generate C# Enums from SQL Server Database Tables

When building applications with C# and SQL Server, it is often necessary to define codes in the database that correspond with enums in the application. However, it can be burdensome to maintain the enums manually, as codes are added, modified, and deleted in the database.

In this blog post, I’ll share a T4 template that I wrote which does this automatically. I looked online first, and did find a few different solutions for this, but none that worked for me as-is. So, I built this generic T4 template to do the job, and you can use it too.

Let’s say you’ve got a Color table and ErrorType table in the database, populated as follows:

Now you’d like enums in your C# application that correspond to these rows. The T4 template will generate them as follows:

Before showing the code that generates this, let’s point out a few things.

First, notice that the Color table has only an ID and a name column, while the ErrorType table has an ID, name, and description columns. Thus, the ErrorType enums include XML comments based on the descriptions in the database, which appear in IntelliSense when referencing the enums in Visual Studio.

Also notice that one enum was generated for the Color table, while two enums were generated for the ErrorType table. This is because we instructed the T4 template to create different enums for different value ranges (0 – 999 for SystemErrorType, and 1000 – 1999 for CustomerErrorType) in the same table.

Finally, the Color enum includes an additional member Undefined with a value of 0 that does not appear in the database, while the other enums don’t include the Undefined member. This is because, in the case of Color, we’d like an enum to represent Undefined, but we don’t want it in the database because we don’t want to allow 0 as a foreign key into the Color table.

This gives you an idea of the flexibility that this T4 template gives you for generating enums from the database. The enums above were generated with the following T4 template:

<#@include file="EnumsDb.ttinclude" #>
<#
  var configFilePath = "app.config";

  var enums = new []
  {
    new EnumEntry
      ("Supported colors", "DemoDatabase", "dbo", "Color", "ColorId", "ColorName", "ColorName")
      { GenerateUndefinedMember = true },

    new EnumEntry
      ("System error types", "DemoDatabase", "dbo", "ErrorType", "ErrorTypeId", "ErrorTypeName", "Description")
      { EnumName = "SystemErrorType", ValueRange = new Tuple<long, long>(0, 999) },

    new EnumEntry
      ("Customer error types", "DemoDatabase", "dbo", "ErrorType", "ErrorTypeId", "ErrorTypeName", "Description")
      { EnumName = "CustomerErrorType", ValueRange = new Tuple<long, long>(1000, 1999) },
  };

  var code = this.GenerateEnums(configFilePath, enums);

  return code;
#>

You can see that there is very little code that you need to write in the template, and that’s because all the real work is contained inside the EnumsDb.ttinclude file (referenced at the top with @include), which contains the actual code to generate the enums, and can be shared throughout your application. A complete listing of the EnumsDb.ttinclude file appears at the end of this blog post.

All you need to do is call GenerateEnums with two parameters. The first is a path your application’s configuration file, which holds the database connection string(s). The second is an array of EnumEntry objects, which drives the creation of enums in C#. Specifically, in this case, there are three EnumEntry objects, which results in the three enums that were generated.

Each enum entry contains the following properties:

  1. Description (e.g., “Supported colors”). This description appears as XML comments for the enum.
  2. Database connection string key (e.g., “DemoDatabase”). It is expected that a connection string named by this key is defined in the application’s configuration file. Since this property exists for each enum entry, it is entirely possible to generate different enums from different databases.
  3. Schema name (e.g., “dbo”). This is the schema that the table is defined in.
  4. Table name (e.g., “Color”). This is the table that contains rows of data for each enum member.
  5. ID column name (e.g., “ColorId”). This is the column that contains the numeric value for each enum member. It must be an integer type, but it can be an integer of any size. The generator automatically maps tinyint to byte, int to int, smallint to short, and bigint to long.
  6. Member column name (e.g., “ColorName”). This is the column that contains the actual name of the enum member.
  7. Description column name. This should be the same as the Member column name (e.g., “ColorName”) if there is no description, or it can reference another column containing the description if one exists. If there is a description, then it is generated as XML comments above each enum member.
  8. Optional:
    1. EnumName. If not specified, the enum is named after the table. Otherwise, you can choose another name if you don’t want the enum named after the table name.
    2. GenerateUndefinedMember. If specified, then the template will generate an Undefined member with a value of 0.
    3. ValueRange. If specified, then enum members are generated only for rows in the database that fall within the specified range of values. The ErrorType table uses this technique to generate two enums from the table; one named SystemErrorType with values 0 – 999, and another named CustomerErrorType with values 1000 – 1999.

This design offers a great deal of flexibility, while constantly ensuring that the enums defined in your C# application are always in sync with the values defined in the database. And of course, you can modify the template as desired to suit any additional needs that are unique to your application.

The full template code is listed below. I hope you enjoy using it as much as I enjoyed writing it. Happy coding! 😊

<#@ template hostspecific="true" language="C#" debug="false" #>
<#@ assembly name="System.Configuration" #>
<#@ assembly name="System.Data" #>
<#@ import namespace="System.Configuration" #>
<#@ import namespace="System.Data" #>
<#@ import namespace="System.Data.SqlClient" #>
<#@ import namespace="System.Text" #>
<#+
  private string GenerateEnums(string configFilePath, EnumEntry[] entries)
  {
    if (entries == null)
    {
      return string.Empty;
    }

    var ns = System.Runtime.Remoting.Messaging.CallContext.LogicalGetData("NamespaceHint");
    var sb = new StringBuilder();
    sb.AppendLine(this.GenerateHeaderComments());
    sb.AppendLine($"namespace {ns}");
    sb.AppendLine("{");
    foreach(var entry in entries)
    {
      try
      {
        sb.Append(this.GenerateEnumMembers(configFilePath, entry));
      }
      catch (Exception ex)
      {
        sb.AppendLine($"#warning Error generating enums for {entry.EnumDescription}");
        sb.AppendLine($"  // Message: {ex.Message}");
      }
    }
    sb.AppendLine();
    sb.AppendLine("}");
    return sb.ToString();
  }

  private string GenerateHeaderComments()
  {
    var comments = $@"
// ------------------------------------------------------------------------------------------------
// <auto-generated>
//  This code was generated by a C# code generator
//  Generated at {DateTime.Now}
// 
//  Warning: Do not make changes directly to this file; they will get overwritten on the next
//  code generation.
// </auto-generated>
// ------------------------------------------------------------------------------------------------
    ";

    return comments;
  }

  private string GenerateEnumMembers(string configFilePath, EnumEntry entry)
  {
    var code = new StringBuilder();
    var connStr = this.GetConnectionString(configFilePath, entry.ConnectionStringKey);
    var enumDataType = default(string);
    using (var conn = new SqlConnection(connStr))
    {
      conn.Open();
      using (var cmd = conn.CreateCommand())
      {
        cmd.CommandText = @"
          SELECT DATA_TYPE
          FROM INFORMATION_SCHEMA.COLUMNS
          WHERE TABLE_SCHEMA = @SchemaName AND TABLE_NAME = @TableName AND COLUMN_NAME = @IdColumnName
        ";

        cmd.Parameters.AddWithValue("@SchemaName", entry.SchemaName);
        cmd.Parameters.AddWithValue("@TableName", entry.TableName);
        cmd.Parameters.AddWithValue("@IdColumnName", entry.IdColumnName);

        var sqlDataType = cmd.ExecuteScalar();
        if(sqlDataType == null)
        {
          throw new Exception($"Could not discover ID column [{entry.IdColumnName}] data type for enum table [{entry.SchemaName}].[{entry.TableName}]");
        }
        enumDataType = this.GetEnumDataType(sqlDataType.ToString());

        var whereClause = string.Empty;
        if (entry.ValueRange != null)
        {
          whereClause = $"WHERE [{entry.IdColumnName}] BETWEEN {entry.ValueRange.Item1} AND {entry.ValueRange.Item2}";
        }

        cmd.CommandText = $@"
          SELECT
            Id = [{entry.IdColumnName}],
            Name = [{entry.NameColumnName}],
            Description = [{entry.DescriptionColumnName}]
          FROM
            [{entry.SchemaName}].[{entry.TableName}]
          {whereClause}
          ORDER BY
            [{entry.IdColumnName}]
        ";
        cmd.Parameters.Clear();

        var innerCode = new StringBuilder();
        var hasUndefinedMember = false;
        using (var rdr = cmd.ExecuteReader())
        {
          while (rdr.Read())
          {
            if (rdr["Name"].ToString() == "Undefined" || rdr["Id"].ToString() == "0")
            {
              hasUndefinedMember = true;
            }
            if (entry.NameColumnName != entry.DescriptionColumnName)
            {
              innerCode.AppendLine("\t\t/// <summary>");
              innerCode.AppendLine($"\t\t/// {rdr["Description"]}");
              innerCode.AppendLine("\t\t/// </summary>");
            }
            innerCode.AppendLine($"\t\t{rdr["Name"].ToString().Replace("-", "_")} = {rdr["Id"]},");
          }
          rdr.Close();
        }
        if ((entry.GenerateUndefinedMember) && (!hasUndefinedMember))
        {
          var undefined = new StringBuilder();

          undefined.AppendLine("\t\t/// <summary>");
          undefined.AppendLine("\t\t/// Undefined (not mapped in database)");
          undefined.AppendLine("\t\t/// </summary>");
          undefined.AppendLine("\t\tUndefined = 0,");

          innerCode.Insert(0, undefined);
        }
        code.Append(innerCode.ToString());
      }
      conn.Close();
    }

    var final = $@"
  /// <summary>
  /// {entry.EnumDescription}
  /// </summary>
  public enum {entry.EnumName} : {enumDataType}
  {{
    // Database information:
    //  {entry.DbInfo}
    //  ConnectionString: {connStr}

{code}
  }}
    ";

    return final;
  }

  private string GetConnectionString(string configFilePath, string key)
  {
    var map = new ExeConfigurationFileMap();
    map.ExeConfigFilename = this.Host.ResolvePath(configFilePath);
    var config = ConfigurationManager.OpenMappedExeConfiguration(map, ConfigurationUserLevel.None);
    var connectionString = config.ConnectionStrings.ConnectionStrings[key].ConnectionString;

    return connectionString;
  }

  private string GetEnumDataType(string sqlDataType)
  {
    var enumDataType = default(string);
    switch (sqlDataType.ToString())
    {
      case "tinyint"  : return "byte";
      case "smallint" : return "short";
      case "int"      : return "int";
      case "bigint"   : return "long";
      default :
        throw new Exception($"SQL data type {sqlDataType} is not valid as an enum ID column");
    }
  }

  private class EnumEntry
  {
    public string EnumDescription { get; }
    public string ConnectionStringKey { get; }
    public string SchemaName { get; }
    public string TableName { get; }
    public string IdColumnName { get; }
    public string NameColumnName { get; }
    public string DescriptionColumnName { get; }

    public string EnumName { get; set; }
    public Tuple<long, long> ValueRange { get; set; }
    public bool GenerateUndefinedMember { get; set; }

    public string DbInfo
    {
      get
      {
        var info = $"ConnectionStringKey: {this.ConnectionStringKey}; Table: [{this.SchemaName}].[{this.TableName}]; IdColumn: [{this.IdColumnName}]; NameColumn: [{this.NameColumnName}]; DescriptionColumn: [{this.DescriptionColumnName}]";
        if (this.ValueRange != null)
        {
          info = $"{info}; Range: {this.ValueRange.Item1} to {this.ValueRange.Item2}";
        }
        return info;
      }
    }

    public EnumEntry(
      string enumDescription,
      string connectionStringKey,
      string schemaName,
      string tableName,
      string idColumnName = null,
      string nameColumnName = null,
      string descriptionColumnName = null
    )
    {
      this.EnumDescription = enumDescription;
      this.ConnectionStringKey = connectionStringKey;
      this.SchemaName = schemaName;
      this.TableName = tableName;
      this.IdColumnName = idColumnName ?? tableName + "Id";
      this.NameColumnName = nameColumnName ?? "Name";
      this.DescriptionColumnName = descriptionColumnName ?? "Description";
      this.EnumName = tableName;
    }

  }
#>

Creating a Helpful Extension Method for XAML Visibility

Used recklessly, extension methods are potentially evil. But judiciously applied, they can be extremely helpful. In this short post (well, short for me), I’ll show you how to create an extension method for the bool class that will simplify your .NET code in XAML-based apps (either WPF or Silverlight). In particular, this extension method addresses the fact that the Visibility property of UI controls in XAML is not a Boolean (true/false) value (as has traditionally been the case for both Windows Forms and ASP.NET). Instead, it’s one of three possible enumerated constant values: Hidden, Collapsed, and Visible.

Visible obviously means “visible,” while both Hidden and Collapsed mean “invisible.” The difference between Hidden and Collapsed is merely whether or not the invisible control occupies blank space on the screen that it would occupy if it were visible. Collapsed consumes no space, while Hidden does.

It’s nice to have the three options, but in most cases you’ll find that you just need the two options Visible and Collapsed. If you’re setting visibility to XAML controls using these two enums, you’ve probably noticed that it’s just not as clean as it is with Windows Forms or ASP.NET controls. You typically already have a Boolean value—either as a simple variable, or an expression—that determines whether the control should be visible or not. You could use that value to enable or disable the control, as the IsEnabled property is also a Boolean, but you can’t assign it to the Visibility property because that property expects one of the three Visibility enumerations—not a Boolean. That’s frustrating, because from a UX (user experience) perspective, where these concepts are used to convey “availability” to the user, hiding/showing controls versus enabling/disabling them is a subtlety of UI design. So developers should be able to think (and code) freely in terms of Booleans for both IsEnabled and Visibility.

Consider the C# code to manipulate the Visibility and IsEnabled properties of a button control:

OpenCustomerButton.IsEnabled = false;
OpenCustomerButton.IsEnabled = true;
OpenCustomerButton.Visibility = Visibility.Collapsed;
OpenCustomerButton.Visibility = Visibility.Visible;

bool isAvailable = true;
OpenCustomerButton.IsEnabled = isAvailable;
OpenCustomerButton.Visibility = (isAvailable ? Visibility.Collapsed : Visibility.Visible);

// Set the button's visibility according another button's enabled/disabled state
OpenCustomerButton.Visibility = (ManageCustomersButton.IsEnabled ? Visibility.Visible : Visibility.Collapsed);

You can see that because Visibility takes the enum rather than a bool, it’s harder to work with than IsEnabled, and just isn’t as neat. But we can do something about this. How? By writing a simple (one-line!) extension method that adds a ToVisibility method to the bool class:

public static class BoolExtensions
{
  public static Visibility ToVisibility(this bool isVisible)
  {
    return (isVisible ? Visibility.Visible : Visibility.Collapsed);
  }
}

Now the same UI code is much easier to read and write:

OpenCustomerButton.IsEnabled = false;
OpenCustomerButton.IsEnabled = true;
OpenCustomerButton.Visibility = false.ToVisibility();
OpenCustomerButton.Visibility = true.ToVisibility();

bool isAvailable = true;
OpenCustomerButton.IsEnabled = isAvailable;
OpenCustomerButton.Visibility = isAvailable.ToVisibility();

// Set the button's visibility according to another button's enabled/disabled state
OpenCustomerButton.Visibility = ManageCustomersButton.IsEnabled.ToVisibility();

Like? Me too. Neatness counts!

It would be even better if we could instead extend every UIElement with a Boolean property called IsVisible, and embed the conversion logic bi-directionally in the property’s getter and setter. But, unfortunately, you cannot create extension properties in .NET, only extension methods. So extending bool with a ToVisibility method is the next best thing.

One more tip: Put the BoolExtensions class inside a common code namespace that all your XAML classes import, so ToVisibility is always available without needing to add an extra using statement.

Remember, abusing extension methods quickly leads to messy, buggy code. Instead, find appropriate uses for extension methods that yield cleaner, tighter code. That is all. 🙂

Using SqlFileStream in C# to Access SQL Server FILESTREAM Data

FILESTREAM is a powerful feature in SQL Server that stores varbinary(max) column data (BLOBs) in the file system (where BLOBs belong) rather than in the database’s structured file groups (where BLOBs kill performance). This feature was first introduced in SQL Server 2008, and is now being expanded with the new FileTable feature coming in SQL Server 2012 (code-named “Denali”), which I’ll cover in a near-future post (as well as in my Programming Microsoft SQL Server 2012 book, which I’m writing now).

If you’re not already familiar with FILESTREAM, you can get the necessary background by reading my two earlier blog posts: Introducing FILESTREAM and Enabling and Using FILESTREAM. In this post, I’ll show you how to use the SqlFileStream class to achieve high-performance streaming of SQL Server FILESTREAM data in your C# applications. This is a simplified version of the OpenSqlFilestream code I presented in my Using the OpenSqlFilestream API blog post.

What Is SqlFileStream?

SqlFileStream is a class in the .NET Framework (.NET 3.5 SP1 and higher) that wraps the OpenSqlFilestream function exposed by the SQL Server Native Client API. This lets you stream BLOBs directly between SQL Server and your .NET application written in C#. SqlFileStream is always used within a transaction, just at the point that you want to store and retrieve BLOBs to and from varbinary(max) FILESTREAM columns. At that point in time, SQL Server will “step aside” and let you stream directly against the server’s NTFS file system—a native environment optimized for streaming. This provides you with a streaming “tunnel” between your application and SQL Server’s internally-managed file system. Using SqlFileStream will give your application lightning-fast BLOB performance. Let’s dive in!

Creating the Database

Before getting started, be sure that FILESTREAM is enabled for remote file system access at both the Windows Service and SQL Server instance levels (as explained in Enabling and Using FILESTREAM). Then create a FILESTREAM-enabled database as follows (be sure to create the directory, C:\DB in this example, before creating the database):

CREATE DATABASE PhotoLibrary
 ON PRIMARY
  (NAME = PhotoLibrary_data,
   FILENAME = 'C:\DB\PhotoLibrary_data.mdf'),
 FILEGROUP FileStreamGroup CONTAINS FILESTREAM
  (NAME = PhotoLibrary_blobs,
   FILENAME = 'C:\DB\Photos')
 LOG ON
  (NAME = PhotoLibrary_log,
   FILENAME = 'C:\DB\PhotoLibrary_log.ldf')

Next, use the database and create a table for BLOB storage as follows:

USE PhotoLibrary
GO

CREATE TABLE PhotoAlbum(
 PhotoId int PRIMARY KEY,
 RowId uniqueidentifier ROWGUIDCOL NOT NULL UNIQUE DEFAULT NEWID(),
 Description varchar(max),
 Photo varbinary(max) FILESTREAM DEFAULT(0x))

In this table, the Photo column is declared as varbinary(max) FILESTREAM, and will hold pictures that will be stored in the file system behind the scenes. (Refer to Enabling and Using FILESTREAM for a complete explanation of the varbinary(max) FILESTREAM and ROWGUIDCOL columns.) Notice the default value we’ve established for the BLOB column. The value 0x represents a zero-length binary stream, which is different than NULL. Think of it as the difference between a zero-length string and a null string in .NET; the two are not the same. Similarly, you won’t be able to use SqlFileStream against NULL instances of varbinary(max) FILESTREAM columns, and you’ll soon see why.

Writing SqlFileStream Code

Start Visual Studio, create a new Class Library project, and add a PhotoData class as follows:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.Drawing;
using System.IO;
using System.Transactions;

namespace PhotoLibraryApp
{
  public class PhotoData
  {
    private const string ConnStr =
      "Data Source=.;Integrated Security=True;Initial Catalog=PhotoLibrary;";

    public static void InsertPhoto
      (int photoId, string desc, string filename)
    {
      const string InsertTSql = @"
        INSERT INTO PhotoAlbum(PhotoId, Description)
          VALUES(@PhotoId, @Description);
        SELECT Photo.PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT()
          FROM PhotoAlbum
          WHERE PhotoId = @PhotoId";

      string serverPath;
      byte[] serverTxn;

      using (TransactionScope ts = new TransactionScope())
      {
        using (SqlConnection conn = new SqlConnection(ConnStr))
        {
          conn.Open();

          using (SqlCommand cmd = new SqlCommand(InsertTSql, conn))
          {
            cmd.Parameters.Add("@PhotoId", SqlDbType.Int).Value = photoId;
            cmd.Parameters.Add("@Description", SqlDbType.VarChar).Value = desc;
            using (SqlDataReader rdr = cmd.ExecuteReader())
            {
              rdr.Read();
              serverPath = rdr.GetSqlString(0).Value;
              serverTxn = rdr.GetSqlBinary(1).Value;
              rdr.Close();
            }
          }
          SavePhotoFile(filename, serverPath, serverTxn);
        }
        ts.Complete();
      }
    }

    private static void SavePhotoFile
      (string clientPath, string serverPath, byte[] serverTxn)
    {
      const int BlockSize = 1024 * 512;

      using (FileStream source =
        new FileStream(clientPath, FileMode.Open, FileAccess.Read))
      {
        using (SqlFileStream dest =
          new SqlFileStream(serverPath, serverTxn, FileAccess.Write))
        {
          byte[] buffer = new byte[BlockSize];
          int bytesRead;
          while ((bytesRead = source.Read(buffer, 0, buffer.Length)) > 0)
          {
            dest.Write(buffer, 0, bytesRead);
            dest.Flush();
          }
          dest.Close();
        }
        source.Close();
      }
    }

    public static Image SelectPhoto(int photoId, out string desc)
    {
      const string SelectTSql = @"
        SELECT
            Description,
            Photo.PathName(),
            GET_FILESTREAM_TRANSACTION_CONTEXT()
          FROM PhotoAlbum
          WHERE PhotoId = @PhotoId";

      Image photo;
      string serverPath;
      byte[] serverTxn;

      using (TransactionScope ts = new TransactionScope())
      {
        using (SqlConnection conn = new SqlConnection(ConnStr))
        {
          conn.Open();

          using (SqlCommand cmd = new SqlCommand(SelectTSql, conn))
          {
            cmd.Parameters.Add("@PhotoId", SqlDbType.Int).Value = photoId;

            using (SqlDataReader rdr = cmd.ExecuteReader())
            {
              rdr.Read();
              desc = rdr.GetSqlString(0).Value;
              serverPath = rdr.GetSqlString(1).Value;
              serverTxn = rdr.GetSqlBinary(2).Value;
              rdr.Close();
            }
          }
          photo = LoadPhotoImage(serverPath, serverTxn);
        }

        ts.Complete();
      }

      return photo;
    }

    private static Image LoadPhotoImage(string filePath, byte[] txnToken)
    {
      Image photo;

      using (SqlFileStream sfs =
        new SqlFileStream(filePath, txnToken, FileAccess.Read))
      {
        photo = Image.FromStream(sfs);
        sfs.Close();
      }

      return photo;
    }

  }
}

Let’s explain the code in detail. We’ll start at the top with some required namespace inclusions. The two using statements to take notice of are System.Data.SqlTypes and System.Transactions. The System.Data.SqlTypes namespace defines the SqlFileStream class that we’ll be using to stream BLOBs. No special assembly reference is required to use this class, because it is provided by the System.Data.dll assembly that our project is already referencing (Visual Studio set this reference automatically when it created our project). The System.Transactions namespace defines the TransactionScope class that lets us code implicit transactions against the database. This class is provided by the System.Transactions.dll assembly, which is not referenced automatically. You’ll need to add a reference to it now, or the code will not compile. Right-click the project in Solution Explorer and choose Add Reference. In the Add Reference dialog, click the .NET tab, and scroll to find the System.Transactionscomponent. Then double-click it to add the reference.

At the top of the class, we define a connection string as a hard-coded constant named ConnStr. This is just for demonstration purposes; a real-world application would store the connection string elsewhere (such as in a configuration file, possibly encrypted), but we’re keeping our example simple.

Streaming Into SQL Server

The first method defined in the class is InsertPhoto, which accepts a new photo integer ID, string description, and full path to an image file to be saved to the database. Notice that the InsertTSql string constant defined at the top of the method specifies an INSERT statement that includes the PhotoId and Description columns, but not the actual Photo BLOB column itself. Instead, the INSERT statement is followed immediately by a SELECT statement that retrieves two pieces of information we’ll use to stream the BLOB into the Photo column much more efficiently than using ordinary T-SQL—namely, a logical UNC path name to the file and the transactional context token. These are the two values needed to use SqlFileStream, and you’re about to see how exactly. But all we’ve done so far is define a constant holding two T-SQL statements. The constant is followed by two variables declarations serverPath and serverTxn that will receive the two special values when we later execute those T-SQL statements.

The method then creates and enters a new TransactionScope block. This does not actually begin the database transaction (we’ve not even connected to the database yet), but rather declares that all data access within the block (and in any code called from within the block) must participate in a database transaction. Inside the TransactionScope block, the code creates and opens a new SqlConnection. Being the first data access code inside the TransactionScope block, this also implicitly begins the database transaction. Next, it creates a SqlCommand object associated with the open connection and prepares its command text to contain our T-SQL statements (the INSERT followed by the SELECT).

Invoking the ExecuteReader method executes the T-SQL statements and returns a reader from which we can retrieve the values returned by the SELECT statement. The transaction is still pending at this time. Our INSERT statement does not provide a value for RowId and instead allows SQL Server to automatically generate and assign a new uniqueidentifier ROWGUID value by default. I’ve also pointed out that no value is provided for the Photo column—and this is exactly how the default 0x value that we defined earlier for the Photo column comes into play (I said we’d come back to it, and here we are).

Although the row has been added by the INSERT statement, it will rollback (disappear) if a problem occurs before the transaction is committed. Because we didn’t provide a BLOB value for the Photo column in the new row, SQL Server honors the default value 0x that we established for it in the CREATE TABLE statement for PhotoAlbum. This represents a zero-length binary stream, which is completely different than NULL. Being a varbinary(max) column decorated with the FILESTREAM attribute, an empty file gets created in the file system that SQL Server associates with the new row. At the same time, SQL Server initiates an NTFS file system transaction over this new empty file and synchronizes it with the database transaction. So just like the new row, the new file will disappear if the database transaction does not commit successfully.

Immediately following the INSERT statement, the SELECT statement returns Photo.PathName and GET_FILESTREAM_TRANSACTION_CONTEXT. What we’re essentially doing with the WHERE clause in this SELECT statement is reading back the same row we have just added (but not yet committed) to the PhotoAlbum table in order to reference the BLOB stored in the new file that was just created (also not yet committed) in the file system.

The value returned by Photo.PathName is a fabricated path to the BLOB for the selected PhotoId. The path is expressed in UNC format, and points to the network share name established for the server instance when you first enabled FILESTREAM (by default, this is MSSQLSERVER). It is not a path the file’s physical location on the server, but rather contains information SQL Server can use to derive the file’s physical location. For example, you’ll notice that it always contains the GUID value in the uniqueidentifier ROWGUIDCOL column of the BLOB’s corresponding row. We retrieve the path value from the reader’s first column and store it in the serverPath string variable.

We just explained how SQL Server initiated an NTFS file system transaction over the FILESTREAM data in the new row’s Photo column when we started our database transaction. The GET_FILESTREAM_TRANSACTION_CONTEXT function returns a handle to that NTFS transaction (if you’re not inside a transaction, this function will return NULL and your code won’t work). We obtain the transaction context, which is returned by the reader’s second column as a SqlBinary value, and store it in the byte array named serverTxn.

Armed with the BLOB path reference in serverPath and the transaction context in serverTxn, we have what we need to create a SqlFileStream object and perform direct file access to stream our image into the Photo column. We close the reader, terminate its using block, then terminate the enclosing using block for the SqlConnection as well. This would normally close the database connection implicitly, but that gets deferred in this case because the code is still nested inside the outer using block for the TransactionScope object. So the connection is still open at this time, and the transaction is still pending. It is precisely at this point that we call the SavePhotoFile method to stream the specified image file into the Photo column of the newly inserted PhotoAlbum row, overwriting the empty file just created by default. When control returns from SavePhotoFile, the TransactionScope object’s Complete method is invoked and its using block is terminated, signaling the transaction management API that everything worked as expected. This implicitly commits the database transaction (which in turn commits the NTFS file system transaction) and closes the database connection.

The SavePhotoFile method reads from the source file and writes to the database FILESTREAM storage in 512 KB chunks at a time using ordinary .NET streaming techniques. The method begins by defining a BlockSize integer constant that is set to a reasonable value of 512 KB. Picture files larger than this will be streamed to the server in 512 KB blocks at a time. The local source image file (in clientPath) is then opened on an ordinary read-only FileStream object.

Then the destination file is opened by passing the two special values (serverPath and serverTxn), along with a FileAccess.Write enumeration requesting write access, into the SqlFileStream constructor. Like the source FileStream object, SqlFileStream inherits from System.IO.Stream, so it can be treated just like any ordinary stream. Thus, you attain write access to the destination BLOB on the database server’s NTFS file system. Remember that this output file is enlisted in an NTFS transaction and nothing you stream to it will be permanently saved until the database transaction is committed by the terminating TransactionScope block, after SavePhotoFile completes. The rest of the SavePhotoFile method implements a simple loop that reads from the source FileStream and writes to the destination SqlFileStream, one 512 KB block at a time until the entire source file is processed, and then it closes both streams.

Streaming Out From SQL Server

The rest of the code contains methods to retrieve existing photos and stream their content from the file system into an Image object for display. You’ll find that this code follows the same pattern as the last, only now we’re performing read access.

The SelectPhoto method accepts a photo ID and returns the string description from the database in an output parameter. The actual BLOB itself is returned as the method’s return value in a System.Drawing.Image object. We populate the Image object with the BLOB by streaming into it from the database server’s NTFS file system using SqlFileStream. Once again, we start things off by entering a TransactionScope block and opening a connection. We then execute a simple SELECT statement that queries the PhotoAlbum table for the record specified by the photo ID and returns the description and full path to the image BLOB, as well as the FILESTREAM transactional context token. And once again we use the path name and transactional context with SqlFileStream to tie into the server’s file system in the LoadPhotoImage method.

Just as when we were inserting new photos (only this time using FileAccess.Read instead of FileAccess.ReadWrite), we create a new SqlFileStream object from the logical path name and transaction context. We then pull the BLOB content directly from the NTFS file system on the server into a new System.Drawing.Image object using the static Image.FromStream method against the SqlFileStream object. The populated image can then be passed back up to a Windows Forms application, where it can be displayed using the Image property of a PictureBox control.

Or, to stream a photo over HTTP from a simple ASP.NET service:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.IO;
using System.Transactions;
using System.Web.UI;

namespace PhotoLibraryHttpService
{
  public partial class PhotoService : Page
  {
    private const string ConnStr =
      "Data Source=.;Integrated Security=True;Initial Catalog=PhotoLibrary";

    protected void Page_Load(object sender, EventArgs e)
    {
      int photoId = Convert.ToInt32(Request.QueryString["photoId"]);
      if (photoId == 0)
      {
        return;
      }

      const string SelectTSql = @"
        SELECT Photo.PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT()
         FROM PhotoAlbum
         WHERE PhotoId = @PhotoId";

      using (TransactionScope ts = new TransactionScope())
      {
        using (SqlConnection conn = new SqlConnection(ConnStr))
        {
          conn.Open();

          string serverPath;
          byte[] serverTxn;

          using (SqlCommand cmd = new SqlCommand(SelectTSql, conn))
          {
            cmd.Parameters.Add("@PhotoId", SqlDbType.Int).Value = photoId;

            using (SqlDataReader rdr = cmd.ExecuteReader())
            {
              rdr.Read();
              serverPath = rdr.GetSqlString(0).Value;
              serverTxn = rdr.GetSqlBinary(1).Value;
              rdr.Close();
            }
          }

          this.StreamPhotoImage(serverPath, serverTxn);
        }
        ts.Complete();
      }
    }

    private void StreamPhotoImage(string serverPath, byte[] serverTxn)
    {
      const int BlockSize = 1024 * 512;
      const string JpegContentType = "image/jpeg";

      using (SqlFileStream sfs =
        new SqlFileStream(serverPath, serverTxn, FileAccess.Read))
      {
        byte[] buffer = new byte[BlockSize];
        int bytesRead;
        Response.BufferOutput = false;
        Response.ContentType = JpegContentType;
        while ((bytesRead = sfs.Read(buffer, 0, buffer.Length)) > 0)
        {
          Response.OutputStream.Write(buffer, 0, bytesRead);
          Response.Flush();
        }
        sfs.Close();
      }
    }
  }
}

Conclusion

The OpenSqlFilestream function provides native file streaming capabilities between FILESTREAM storage in the file system managed by SQL Server and any native-code (e.g., C++) application. SqlFileStream provide a managed code wrapper around OpenSqlFilestream that simplifies direct FILESTREAM access from .NET applications (e.g., C# and VB .NET). This post explained how this API works in detail, and showed the complete data access code that uses SqlFileStream for both reading and writing BLOBs to and from SQL Server. That’s everything you need to know to get the most out of FILESTREAM. I hope you enjoyed it!

SQL Server 2008 FILESTREAM Part 3 of 3: Using the OpenSqlFilestream API

This is the final installment in a 3-post series covering the new FILESTREAM feature in SQL Server 2008. In part 1, I explained how FILESTREAM works at a high level, and part 2 showed you the step-by-step procedure for enabling and using this awesome new feature. In this post, I’ll show you how to use the OpenSqlFilestream function exposed by the SQL Server Native Client API to achieve the maximum FILESTREAM performance possible in your .NET applications.

IMPORTANT UPDATE: I’ve just blogged on SqlFileStream, a managed code wrapper around OpenSqlFilestream that offers a simpler implementation of the mechanism described in this post. Read it here: https://lennilobel.wordpress.com/2011/08/22/using-sqlfilestream-in-c-to-access-sql-server-filestream-data/

What Is OpenSqlFilestream?

OpenSqlFilestream is a function provided by the SQL Server Native Client API, sqlncli10.dll, and gets installed on your development machine when you install the SQL Server 2008 client tools. This function can be called at the point in time that you want to store and retrieve BLOBs from varbinary(max) FILESTREAM columns, where SQL Server will “step aside,” and allow you to call OpenSqlFilestream to obtain a file handle. With the file handle returned by OpenSqlFilestream, you can stream directly against the file system—a native environment optimized for streaming. Contrast such direct file system access with storing and retrieving BLOBs against varbinary(max) FILESTREAM columns the “old-fashioned” way (either by embedding/extracting byte arrays, or in-lining binary streams as hexadecimal values in T-SQL as I demonstrated in part 2), which carries the additional overhead of the FILESTREAM abstraction layer. Using OpenSqlFilestream instead will give you lightning-fast BLOB performance. Let’s dive in!

Creating the Database

Before getting started, be sure that FILESTREAM is enabled for remote file system access at both the Windows Service and SQL Server instance levels (as explained in part 2). Then create a FILESTREAM-enabled database as follows (be sure to create the directory, C:\DB in this example, before creating the database):

CREATE DATABASE PhotoLibrary
 ON PRIMARY
  (NAME = PhotoLibrary_data,
   FILENAME = 'C:\DB\PhotoLibrary_data.mdf'),
 FILEGROUP FileStreamGroup CONTAINS FILESTREAM
  (NAME = PhotoLibrary_blobs,
   FILENAME = 'C:\DB\Photos')
 LOG ON
  (NAME = PhotoLibrary_log,
   FILENAME = 'C:\DB\PhotoLibrary_log.ldf')

Next, use the database and create a table for BLOB storage as follows:

USE PhotoLibrary
GO

CREATE TABLE PhotoAlbum(
 PhotoId int PRIMARY KEY,
 RowId uniqueidentifier ROWGUIDCOL NOT NULL UNIQUE DEFAULT NEWSEQUENTIALID(),
 Description varchar(max),
 Photo varbinary(max) FILESTREAM DEFAULT(0x))

In this table, the Photo column is declared as varbinary(max) FILESTREAM, and will hold pictures that will be stored in the file system behind the scenes. This is virtually the same CREATE TABLE statement as shown in my last post (refer to part 2 for a complete explanation of the varbinary(max) FILESTREAM and ROWGUIDCOL columns). The only thing different here is the default value for the BLOB column. The value 0x represents a zero-length binary stream, which is different than NULL. Think of it as the difference between a zero-length string and a null string in .NET; the two are not the same. Similarly, you won’t be able to use OpenSqlFilestream against NULL instances of varbinary(max) FILESTREAM columns, and you’ll soon see why.

Storing BLOBs Using OpenSqlFilestream

I’ll walk you through the complete steps for building a C# data access class in Visual Studio that demonstrates how to use OpenSqlFilestream. In order to keep things simple for demonstration purposes, we won’t be following best practices; namely, we’ll be executing direct T-SQL statements instead of using stored procedures (as you should be using in production code).

Start Visual Studio and create a new Class Library project named PhotoLibraryDAL with a single class file named PhotoData.cs, and add the following public static method to the class named InsertPhoto:

public static void InsertPhoto(int photoId, string desc, string filename)
{
  const string InsertCmd =
   "INSERT INTO PhotoAlbum(PhotoId, Description)" +
   " VALUES(@PhotoId, @Description)";

  using(SqlConnection conn = new SqlConnection(ConnStr))
  {
    conn.Open();

    using(SqlTransaction txn = conn.BeginTransaction())
    {
      using(SqlCommand cmd = new SqlCommand(InsertCmd, conn, txn))
      {
        cmd.Parameters.Add(&quot;@PhotoId&quot;, SqlDbType.Int).Value = photoId;
        cmd.Parameters.Add(&quot;@Description&quot;, SqlDbType.VarChar).Value = desc;
        cmd.ExecuteNonQuery();
      }

      SavePhotoFile(photoId, filename, txn);
      txn.Commit();
    }

    conn.Close();
  }
}

Client applications call the InsertPhoto method to insert a new photo into the PhotoAlbum table by passing in a photo ID, description, and filename pointing to a local file containing the actual photo image. It then opens a connection, begins a transaction, and executes an INSERT statement. Notice that the INSERT statement supplies values only for the PhotoId and Description columns. What about the RowId and Photo columns? Because we’ve omitted them, the defaults we established in the table definition are applied. The RowId column gets assigned the next available GUID by the GETSEQUENTIALID function. And for the PhotoId BLOB column, the default 0x (zero-length binary stream) applies.

The result of supplying a zero-length binary stream value for the Photo column is that SQL Server creates an empty file in the file system that is linked to the Photo column of the row just inserted. However, because we’ve begun a transaction, SQL Server has automatically initiated an NTFS file system transaction over the empty file added to the file system. So now we have a new row inserted in the table—but not committed, and we have a new empty file created in the file system—but not committed. This only works because we specified (albeit by default) a zero-length binary stream (0x) rather than NULL when inserting the varbinary(max) FILESTREAM value. If an exception occurs, or if the database transaction rolls back, or if any other condition occurs in which the database transaction doesn’t commit successfully, the NTFS file system transaction will get rolled back automatically as well. In that case, both the inserted row in the table and the empty file in the file system go away.

Now is that “point in time” that we’d like SQL Server to “step aside” so we can call OpenSqlFilestream to store the BLOB for the photo image. To do that, we call the SavePhotoFile method, which is coded as follows:

private static void SavePhotoFile(int photoId, string filename, SqlTransaction txn)
{
  const int BlockSize = 1024 * 512;
  FileStream source = new FileStream(filename, FileMode.Open, FileAccess.Read);
  SafeFileHandle handle = GetOutputFileHandle(photoId, txn);
  using(FileStream dest = new FileStream(handle, FileAccess.Write))
  {
    byte[] buffer = new byte[BlockSize];
    int bytesRead;
    while((bytesRead = source.Read(buffer, 0, buffer.Length)) > 0)
    {
      dest.Write(buffer, 0, bytesRead);
    }
    dest.Close();
  }
  source.Close();
}

There’s actually nothing magical about this method. You can see that it simply streams, in 512K chunks at a time, from a source stream to a destination stream. This is just the way you’d implement a simple, conventional, file-copy routine. Getting a handle on the source stream is also a no-brainer; since the source file is local, you simply invoke the System.IO.FileStream constructor that accepts a filename and enumerations specifying that the file should be opened for read access. So the question then becomes, how do we get a handle on the destination stream, which is a direct channel to the empty and not-yet-committed file on the server’s NTFS file system that SQL Server associates with the Photo column? The answer lies in the GetOutputFileHandle method which we call to obtain a SafeFileHandle object to the destination file:

private static SafeFileHandle GetOutputFileHandle(int photoId, SqlTransaction txn)
{
  const string GetOutputFileInfoCmd =
    "SELECT GET_FILESTREAM_TRANSACTION_CONTEXT(), Photo.PathName()" +
    " FROM PhotoAlbum" +
    " WHERE PhotoId = @PhotoId";

  SqlCommand cmd = new SqlCommand(GetOutputFileInfoCmd, txn.Connection, txn);
  cmd.Parameters.Add("@PhotoId", SqlDbType.Int).Value = photoId;

  string filePath;
  byte[] txnToken;

  using(SqlDataReader rdr = cmd.ExecuteReader(CommandBehavior.SingleRow))
  {
    rdr.Read();
    txnToken = rdr.GetSqlBinary(0).Value;
    filePath = rdr.GetSqlString(1).Value;
    rdr.Close();
  }

  SafeFileHandle handle =
    NativeSqlClient.GetSqlFilestreamHandle
    (filePath, NativeSqlClient.DesiredAccess.ReadWrite, txnToken);

  return handle;
}

This code is at the heart of the matter. In order to call OpenSqlFilestream, we need to obtain two key pieces of information. First, we need a logical path name to the destination file. We obtain that by selecting back the not-yet-committed row we just inserted, and invoking the PathName method on the Photo column. Nobody else can access this row because it hasn’t been committed yet (that would be a dirty read); it will never actually come into existence if the transaction rolls back. But we can read it because we’re inside the transaction, and so we can easily get the path name. No, the path name returned by PathName is not a UNC path to the physical file system on the server, so just lay that security concern to rest. It’s just a bogus path that has meaning only in the context of this transaction which SQL Server can use to map to the real file in the file system. Secondly, we need a token that identifies the NTFS file system transaction that SQL Server initiated behind the scenes, which we obtain with the GET_FILESTREAM_TRANSACTION_CONTEXT function.

Armed with these two key pieces of information, we can call OpenSqlFilestream. Because OpenSqlFilestream is a native code function, I’ve place it in a separate GetSqlFilestreamHandle method in a separate NativeSqlClient class to keep the DllImport and other native code details isolated from our .NET data access class (yes, I’m a neat freak when it comes to code, and you should be too):

using System;
using System.Runtime.InteropServices;

using Microsoft.Win32.SafeHandles;

namespace PhotoLibraryFilestreamDemo
{
  public class NativeSqlClient
  {
    public enum DesiredAccess : uint
    {
      Read,
      Write,
      ReadWrite,
    }

    [DllImport("sqlncli10.dll", SetLastError = true, CharSet = CharSet.Unicode)]
    private static extern SafeFileHandle OpenSqlFilestream(
      string path,
      uint access,
      uint options,
      byte[] txnToken,
      uint txnTokenLength,
      Sql64 allocationSize);

    [StructLayout(LayoutKind.Sequential)]
    private struct Sql64
    {
      public Int64 QuadPart;
      public Sql64(Int64 quadPart)
      {
        this.QuadPart = quadPart;
      }
    }

    public static SafeFileHandle GetSqlFilestreamHandle
     (string filePath, DesiredAccess access, byte[] txnToken)
    {
      SafeFileHandle handle = OpenSqlFilestream(
        filePath,
        (uint)access,
        0,
        txnToken,
        (uint)txnToken.Length,
        new Sql64(0));

      return handle;
    }
  }
}

As you can see, the GetSqlFilestreamHandle method accepts the transaction context token and the path obtained by the GET_FILESTREAM_TRANSACTION_CONTEXT function and PathName method respectively. It also accepts an enumeration that specifies the desired access mode, which can be Read, Write, or ReadWrite. The OpenSqlFilestream function requires other parameters that are not generally applicable for standard FILESTREAM usage, such as the unsigned 32-bit options and 64-bit allocation size arguments. These simply get passed in as 0. The SafeFileHandle returned by OpenSqlFilestream is defined by the .NET framework in the core library assembly mscorlib.dll, so no special reference needs to be set to access this class. Control then gets passed back up the call stack, to the SavePhotoFile method, which calls an overloaded version of the System.IO.Filestream constructor that accepts a SafeFileHandle object. After the source stream is copied entirely to the destination stream (overwriting the empty file on the server), control returns to the InsertPhoto method and the database transaction is finally committed. At that point, both the inserted row and the file in the file system are permanently saved, and the connection is closed. And that’s the way to stream BLOBs into varbinary(max) FILESTREAM columns in SQL Server 2008!

Retrieving BLOBs Using OpenSqlFilestream

Reading FILESTREAM data back out from the database to your application follows a very similar pattern. You open a connection and start a transaction. Note that this is normally not be considered best practice, as you should always try to have your read operations execute outside the context of a transaction. But to implement OpenSqlFilestream for read operations, this is exactly what you do. Then you use OpenSqlFilestream in the very same was we did for writing the BLOB to the database. The exact implementation depends on where you want to stream the content to.

For example, to stream a photo into an Image object for display in a Windows Forms PictureBox control, you can implement a SelectPhoto method like this:

public static Image SelectPhoto(int photoId, out string desc)
{
  const string SelectCmd =
    "SELECT Description, Photo.PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT()" +
    " FROM PhotoAlbum" +
    " WHERE PhotoId = @PhotoId";

  Image photo;

  using(SqlConnection conn = new SqlConnection(ConnStr))
  {
    conn.Open();

    using(SqlTransaction txn = conn.BeginTransaction())
    {
      string filePath;
      byte[] txnToken;

      using(SqlCommand cmd = new SqlCommand(SelectCmd, conn, txn))
      {
        cmd.Parameters.Add("@PhotoId", SqlDbType.Int).Value = photoId;

        using(SqlDataReader rdr = cmd.ExecuteReader(CommandBehavior.SingleRow))
        {
          rdr.Read();
          desc = rdr.GetSqlString(0).Value;
          filePath = rdr.GetSqlString(1).Value;
          txnToken = rdr.GetSqlBinary(2).Value;
          rdr.Close();
        }
      }

      photo = LoadPhotoImage(filePath, txnToken);

      txn.Commit();
    }

    conn.Close();
  }

  return photo;
}

private static Image LoadPhotoImage(string filePath, byte[] txnToken)
{
  Image photo;

  SafeFileHandle handle =
    NativeSqlClient.GetSqlFilestreamHandle
     (filePath, NativeSqlClient.DesiredAccess.Read, txnToken);

  using(FileStream fs = new FileStream(handle, FileAccess.Read))
  {
    photo = Image.FromStream(fs);
    fs.Close();
  }

  return photo;
}

Because the Image class has a static FromStream method that consumes the stream passed as a parameter to the method, this code essentially firehouses the stream at the fastest possible speed directly from the file system on SQL Server into the Image object.

Or, to stream a photo over HTTP from an ASP.NET service:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.IO;
using Microsoft.Win32.SafeHandles;

namespace PhotoLibraryHttpService
{
 public partial class PhotoService : System.Web.UI.Page
 {
  private const string ConnStr =
    "Data Source=.;Integrated Security=True;Initial Catalog=PhotoLibrary;";

  protected void Page_Load(object sender, EventArgs e)
  {
   int photoId = Convert.ToInt32(Request.QueryString["photoId"]);
   if (photoId == 0)
   {
    return;
   }

   const string SelectCmd =
    "SELECT Photo.PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT()" +
    " FROM PhotoAlbum" +
    " WHERE PhotoId = @PhotoId";

   using (SqlConnection conn = new SqlConnection(ConnStr))
   {
    conn.Open();

    using (SqlTransaction txn = conn.BeginTransaction())
    {
     string filePath;
     byte[] txnToken;

     using (SqlCommand cmd = new SqlCommand(SelectCmd, conn, txn))
     {
      cmd.Parameters.Add("@PhotoId", SqlDbType.Int).Value = photoId;

      using (SqlDataReader rdr = cmd.ExecuteReader(CommandBehavior.SingleRow))
      {
       rdr.Read();
       filePath = rdr.GetSqlString(0).Value;
       txnToken = rdr.GetSqlBinary(1).Value;
       rdr.Close();
      }
     }

     this.StreamPhotoImage(filePath, txnToken);

     txn.Commit();
    }

    conn.Close();
   }
  }

  private void StreamPhotoImage(string filePath, byte[] txnToken)
  {
   const int BlockSize = 1024 * 512;
   const string JpegContentType = "image/jpeg";

   SafeFileHandle handle =
     NativeSqlClient.GetSqlFilestreamHandle
      (filePath, NativeSqlClient.DesiredAccess.Read, txnToken);

   using (FileStream source = new FileStream(handle, FileAccess.Read))
   {
    byte[] buffer = new byte[BlockSize];
    int bytesRead;
    Response.BufferOutput = false;
    Response.ContentType = JpegContentType;
    while ((bytesRead = source.Read(buffer, 0, buffer.Length)) > 0)
    {
     Response.OutputStream.Write(buffer, 0, bytesRead);
     Response.Flush();
    }
    source.Close();
   }
  }
  }
}

Have Fun with FILESTREAM!

This concludes my 3-part series on using FILESTREAM in SQL Server 2008, which explains everything you need to know to get the most out of this truly awesome new feature. I hope you enjoyed it, and look forward to hearing your FILESTREAM success stories!

Data/View Separation in Word 2007 using Word XML Data Binding and OpenXML

The OpenXML file format used by Microsoft Office 2007 enables programmatic document manipulation without automating Word, so that Word does not need to be installed alongside your code, which is great for server applications. In this post, I’ll show you how to use OpenXML to implement bi-directional data binding between a Word document and XML data. This new feature in Word 2007 delivers the best data/view separation we’ve seen to date in Microsoft Office. And (for a LINQ junkie such as myself, anyway) the best part is that it works so nicely with LINQ to XML.

We’ll begin by creating a document with content controls. Then we’ll write a small amount of code to embed an XML file into the document and bind the content controls to various elements in the XML.

Before starting, you need to download the OpenXML SDK 2.0. Once the SDK is installed, you’ll have the assemblies you need to programmatically manipulate Office documents using the OpenXML API. You can download the SDK from here: http://www.microsoft.com/downloads/details.aspx?FamilyId=C6E744E5-36E9-45F5-8D8C-331DF206E0D0&displaylang=en

Create the Document

We’ll create a simple document with three content controls that will be bound to the following XML data:

<Customer>
  <Name>John Doe</Name>
  <Expiration>2/1/2010</Expiration>
  <AmountDue>$129.50</AmountDue>
</Customer>

Start a new document in Word 2007, and type the word Dear followed by a space. We now want to inject our first content control for the customer name. You can add content controls to your document from the Developer tab on the ribbon, but since that tab is hidden by default, you’ll need to show it first. Click the Office button and choose Word Options. Then check the “Show Developer tab in the Ribbon” checkbox and click OK.

With the cursor still positioned after the text “Dear “, click in the Developer tab to insert a new Text content control:

We’ll be assigning tag names to identify content controls for data binding in our code. To view the tag names in Word as we setup the content controls, click the Design Mode button in the Developer tab:

With the content control still selected, click Properties:

 

Set the Tag of the content control to Customer/Name, which is an XPath expression that refers to the <Name> element nested inside the <Customer> element of our XML data.

Click OK to close the dialog box. Notice how Word displays the tag name in the content control in Design Mode:

 

Continue adding text to the document, and insert two more content controls in a similar fashion for the expiration date and amount due fields. Use the same Text content control for the amount due field like we used for the customer name field, but use a Date Picker content control for the expiration date field:

After setting the tag names of the additional content controls to the appropriate XPath expressions (Customer/Expiration and Customer/AmountDue), your document should look something like this:

Turn off Design Mode to hide the tag names from the content controls:

Save the document as CustomerReminder.docx. We have already established a separation of variable data and fixed document text, and could easily setup document protection at this point to allow manual data entry only for content controls. That’s cool all by itself, but we’re going to take this to the next level and implement automatic data binding between these content controls and a separate XML file. Although Word doesn’t care what values you assign to content control tags, we will now write code to find all tagged content controls, and treat those tags as XPath expressions that point to various XML elements in our XML data.

If you’re not already aware, all Office 2007 documents (including Word docx files) are actually compressed zip files (“packages”) that hold a collection of related XML “parts.” You can see this easily by simply appending .zip to the document filename and then opening it from Windows Explorer:

After examining the contents of the document package, rename the filename extension back to “.docx.”

Write the Code

Start Visual Studio and create a new Windows Forms project. Now set references to the two assemblies you’ll need for OpenXML programming: DocumentFormat.OpenXml and WindowsBase (these can be found on the .NET tab of the Add Reference dialog box). The first assembly exposes the OpenXML object model for document manipulation, and the second assembly provides the support for embedding and extracting parts to and from the document package:

Drop a button onto the form named btnInit and supply code for its Click event handler. You’ll also need a few additional using statements for importing some namespaces. The complete code listing below shows the required using statements and the btnInit_Click event handler.

using System;
using System.IO;
using System.Linq;
using System.Windows.Forms;
using System.Xml.Linq;

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

namespace OpenXmlWordDataBinding
{
  public partial class Form1 : Form
  {
    public Form1()
    {
      InitializeComponent();
    }

    private void btnInit_Click(object sender, EventArgs e)
    {
      var xml =
        new XElement("Customer",
          new XElement("Name", "John Doe"),
          new XElement("Expiration", "2/1/2010"),
          new XElement("AmountDue", "$129.50"));

      var docxFile = @"..\..\CustomerReminder.docx";

      using (var wpd = WordprocessingDocument.Open(docxFile, true))
      {
        var mainPart = wpd.MainDocumentPart;
        var xmlPart = mainPart.AddNewPart<CustomXmlPart>();
        using (Stream partStream = xmlPart.GetStream(FileMode.Create, FileAccess.Write))
        {
          using (StreamWriter outputStream = new StreamWriter(partStream))
          {
            outputStream.Write(xml);
          }
        }

        var taggedContentControls =
          from sdt in mainPart.Document.Descendants<SdtRun>()
          let sdtPr = sdt.GetFirstChild<SdtProperties>()
          let tag = (sdtPr == null ? null : sdtPr.GetFirstChild<Tag>())
          where tag != null
          select new
          {
            SdtProps = sdtPr,
            TagName = tag.GetAttribute("val", "http://schemas.openxmlformats.org/wordprocessingml/2006/main").Value
          };

        foreach (var taggedContentControl in taggedContentControls)
        {
          var binding = new DataBinding();
          binding.XPath = taggedContentControl.TagName;
          taggedContentControl.SdtProps.Append(binding);
        }

        mainPart.Document.Save();
      }
    }
  }
}

Let’s analyze this code piece by piece. First, we construct the XML data as a graph of XML objects using LINQ to XML functional construction:

var xml =
  new XElement("Customer",
    new XElement("Name", "John Doe"),
    new XElement("Expiration", "2/1/2010"),
    new XElement("AmountDue", "$129.50"));

Next, we invoke WordprocessingDocument.Open to open the Word document for write access using the OpenXML SDK. We do this inside of a using block to ensure proper disposal of unmanaged resources in the event that unhandled exception occurs inside of the block. The boolean true value specified for the second parameter means that we’re opening the document for write access:

var docxFile = @"..\..\CustomerReminder.docx";
using (var wpd = WordprocessingDocument.Open(docxFile, true))

Our code needs to do two things to the document at this point: add a new XML part containing the data to be bound, and assign the appropriate bindings to the content controls. Realize that this is code that will only execute once; subsequently, we’ll only be extracting/embedding the XML part from/to the document package using normal zip compression methods.

Add a New XML Part

With the document open and accessible to our code via the wpd variable, we invoke the AddNewPart<CustomXmlPart> method on wpd.MainDocumentPart to retrieve a new XML part in xmlPart. The new XML part exposes a GetStream method that enables us to stream content into and out of it. We take the generic Stream returned by GetStream and wrap a StreamWriter around it. The XML content can then be sent into the new XML part by writing to the StreamWriter. Again, we use nested using blocks on the stream objects to ensure proper cleanup if an exception occurs:

var mainPart = wpd.MainDocumentPart;
var xmlPart = mainPart.AddNewPart<CustomXmlPart>();
using (Stream partStream = xmlPart.GetStream(FileMode.Create, FileAccess.Write))
{
  using (StreamWriter outputStream = new StreamWriter(partStream))
  {
    outputStream.Write(xml);
  }
}

Now that the XML data is embedded as an XML part inside of the document’s .docx package, it’s available for data binding to content controls. The next and last thing we need to do is to programmatically find the tagged content controls inside the document and assign XML data bindings using the XPath expressions we defined in the content control tags.

Add the Data Bindings

Learning how to program Word with OpenXML is all about exploring the special XML markup language used by Word (a dialect known as WordprocessingML) and discovering the element and attribute names used to represent the document. You can view the document’s markup by examining the word/document.xml part inside of the .docx package. It can take quite a bit of detective work as you undergo the exploration and discovery process, but once you’re armed with the information you need, it’s very simple and straightforward to write a LINQ to XML query to get at what you need to in the document.

For example, here is how the customer name content control is expressed in WordprocessingML:

<w:sdt>
     <w:sdtPr>
           <w:tag w:val="Customer/Name"/>
           <w:id w:val="852281"/>
           <w:placeholder>
                <w:docPart w:val="DefaultPlaceholder_22675703"/>
           </w:placeholder>
           <w:showingPlcHdr/>
           <w:text/>
     </w:sdtPr>
     <w:sdtContent>
           <w:r w:rsidRPr="004E1B99">
                <w:rPr>
                      <w:rStyle w:val="PlaceholderText"/>
                </w:rPr>
                <w:t>Click here to enter text.</w:t>
           </w:r>
     </w:sdtContent>
</w:sdt>

An analysis of this XML shows how Word stores content controls inside a document. Specifically, Word creates a <w:sdt> element (structured document tag) and nests a <w:sdtPr> element (structured document tag properties) inside of it. And nested beneath that, Word stores the content control’s tag name in the w:val attribute of a <w:Tag> element. These element names can be referred to programmatically using the OpenXML API type names SdtRun, SdtProperties, and Tag, respectively. Thus, the following LINQ to XML query retrieves all tagged content controls in our document:

var taggedContentControls =
  from sdt in mainPart.Document.Descendants<SdtRun>()
  let sdtPr = sdt.GetFirstChild<SdtProperties>()
  let tag = (sdtPr == null ? null : sdtPr.GetFirstChild<Tag>())
  where tag != null
  select new
  {
     SdtProps = sdtPr,
     TagName = tag.GetAttribute("val", "http://schemas.openxmlformats.org/wordprocessingml/2006/main").Value
  };

In this query, the from clause uses the Descendants<SdtRun> method to retrieve all the <w:sdt> elements in the document. For each one, the first let clause uses the GetFirstChild<SdtProperties> method to retrieve the nested <w:sdtPr> element beneath it. The second let clause then performs a similar operation to retrieve the <w:tag> element nested beneath that. Since you cannot assume that every <w:sdt> element will always have a nested <w:sdtPr> element, the second let clause tests sdtPr for null before trying to invoke GetFirstChild<Tag> on it. The where clause ultimately filters our all content controls that have not been tagged.

The select clause constructs an anonymous type that returns two properties for each tagged content control. The first is the <w:sdtPr> which will be needed to reference the location at which we’ll inject the XML binding element. The second is the tag name extracted from the <w:tag> element’s <w:val> attribute (the full OpenXML namespace needs to be specified for w to retrieve the attribute value), which supplies the XPath expression for the binding. The LINQ query returns the results in a sequence that we then iterate to set the bindings:

foreach (var taggedContentControl in taggedContentControls)
{
  var binding = new DataBinding();
  binding.XPath = taggedContentControl.TagName;
  taggedContentControl.SdtProps.Append(binding);
}

mainPart.Document.Save();

For each tagged content control, we create a DataBinding object, set its XPath property to the tag name which is known provide an XPath expression referencing the desired XML data, and add the binding as a new element inside of the <w:sdtPr> element. Finally, invoking the Save method saves the modified document back to disk. If you examine the WordprocessingML now, you’ll see that our code added a <w:dataBinding> element with a w:xpath attribute to the <w:sdtPr> element in each content control. For example, the customer name content control now has the following element in its <w:sdtPr> section:

<w:dataBinding w:xpath="Customer/Name" />

Now let’s examine the docx package again. Add the .zip extension to the document filename and crack it open. Notice the new customXML folder that wasn’t there before. This is where all custom XML parts added to the document get stored.

Double-click the customXML folder to reveal the item.xml file contained inside of it:

Now double-click item.xml to view it in IE:

Look familiar? That’s the XML our code injected into the document as a custom XML part!

Now close the zip, rename the file back as .docx, and open it in Word:

All the content placeholders have now been filled in with data from our XML file Does it get better than this? You bet! This thing works bi-directional. So go ahead and change John Doe to Sue Taylor:

Now save the document, and rename it as .zip once again. This time, extract the item.xml file instead of opening it in IE, so that you can edit it in Notepad:

Beautiful. Word pushed the data from the modified content controls in the document back out to the XML file. Let’s now edit the XML; for example, we’ll change the expiration date:

Save the XML file, push it back into the zip package, rename the zip package back to .docx, and re-open it in Word one last time:

Now that’s what I call data/view separation! With XML data binding in Word, your application can easily view and change data elements without ever digging into the document text.

Of course, this only scratches the surface of what’s possible with OpenXML. It’s also worth mentioning that the same thing can be achieved using VSTO (Visual Studio Tools for Office) and the Word object model, but we’ll leave that for a future post. Until then, happy coding!

Streaming Into LINQ to XML Using C# Custom Iterators and XmlReader

In this post, I’ll show you how to create LINQ to XML queries that don’t require you to first load and cache XML content into the in-memory LINQ to XML DOM (that is, without first populating an XDocument or XElement query source), but instead operate against an input stream implemented with a C# custom iterator method and an old-fashioned XmlReader object.

LINQ to XML

LINQ to XML, introduced with the .NET Framework 3.5, is a huge win for developers working with XML in any shape or form. Whether XML is being queried, parsed, or transformed, LINQ to XML can almost always be used as an easier alternative to previous technologies that are based on XML-specific languages (XPath, XQuery, XSLT).

At the center of the LINQ to XML stage lies a new DOM for caching XML data in memory. This object model, based on either a root XDocument object or independent XElement objects, represents a major improvement over the older XmlDocument-based DOM in numerous ways (details of which will serve as the topic for a future post). In terms of querying, the XDocument and XElement objects provide methods (such as Descendants) that expose collections of nodes which can be iterated by a LINQ to XML query.

Consuming Sequences with LINQ

A common misconception by many developers learning LINQ is that LINQ only consumes collections. That is not surprising, since one of first benefits developers come to understand is that LINQ can be used to easily query an input collection without coding a foreach loop. While this is certainly true, it’s a very limited view of what can really be accomplished with LINQ.

The broader view is that LINQ works against sequences, which are often — but certainly not always — collections. A sequence can be generated by any object that implements IEnumerable<T> or any method that returns an IEnumerable<T>. Such objects and methods provide iterators that LINQ calls to retrieve one element after another from the source sequence being queried. For a collection, the enumerator simply walks the elements of the collection and returns them one at a time. But you can create your own class that implements IEnumerable<T> or your own custom iterator method which serves as the enumerator that returns a sequence based on something other than a collection.

The point? LINQ is not limited to querying in-memory collections. It can be used to query any sequence, which is fed to the LINQ query by enumerator methods exposed by all classes that implement IEnumerable<T>. What this means is that you don’t necessarily need to load an entire XML document into an in-memory cache before you can query it using LINQ to XML. If you are querying very large documents in order to extract just a few elements of interest, you can achieve better performance by having your LINQ to XML query stream through the XML—without ever caching the XML in memory.

Querying Cached XML vs. Streamed XML

How do you write LINQ to XML queries that consume a read-only, forward-only input stream instead of a pre-populated in-memory cache? Easy. Refer to a custom iterator method instead of a pre-populated XDocument or XElement in the from clause of your query.

For example, consider the following query:

var xd = XDocument.Load("Customers.xml");
var domQuery =
  from c in xd.Descendants("Customer")
  where (string)c.Attribute("Country") == "UK"
  select c;

Now compare that query with this version:

var streamQuery =
  from c in StreamElements("Customers.xml", "Customer")
  where (string)c.Attribute("Country") == "UK"
  select c;

Both versions query over the same XML and produce the same output result (a sequence of customer nodes from the UK), but they consume their input sequences in completely differently ways. The first version first loads the XML content into an XDocument (in-memory cache), and then queries the collection of nodes returned by the Descendants method (all <Customer> nodes) for those in the UK. The larger the XML document being queried, the more memory is consumed by this approach. The second version queries directly against an input stream that feeds the XML content as a sequence using a custom iterator method named StreamElements. This version will consume no more memory for querying a huge XML file than it will for a tiny one. Here’s the implementation of the custom iterator method:

private static IEnumerable<XElement> StreamElements(
  string fileName,
  string elementName)
{
  using (var rdr = XmlReader.Create(fileName))
  {
    rdr.MoveToContent();
    while (rdr.Read())
    {
      if ((rdr.NodeType == XmlNodeType.Element) && (rdr.Name == elementName))
      {
        var e = XElement.ReadFrom(rdr) as XElement;
        yield return e;
      }
    }
    rdr.Close();
  }
}

Understanding C# Custom Iterators

By definition, this method is a custom iterator since it returns an IEnumerable<T> and has a yield return statement in it. By returning IEnumerable<XElement> specifically (just like the Descendants method in the cached DOM version of the query does), this custom iterator is suitable as the source for a LINQ to XML query. The implementation of this method is a beautiful demonstration of how seamlessly new technology (LINQ to XML) integrates with old technology (the XmlReader object has been around since .NET 1.0). Let’s examine the method piece by piece to understand exactly how it works.

When the query first begins to execute, and LINQ needs to start scanning the input sequence, the StreamElements custom iterator method is called with two parameters. The first parameter is the name of the input XML file (“Customers.xml”) and the second parameter is the element of interest to be queried (“Customer”, for each <Customer> node in the XML file).  The method then opens an “old-fashioned” XmlReader against the XML file and advances the stream to the beginning of its content by invoking MoveToContent. It then enters a loop that reads from the stream one element at a time. Each element is tested against the second parameter (“Customer”, in this example). Each matching element (which would be all <Customer> elements) is converted into in XElement object by invoking the static XElement.ReadFrom method against the reader. The XElement object representing the matching node is then yield returned to the LINQ query.

As the query continues to execute, and LINQ needs to continue scanning the input sequence for additional elements, the StreamElements custom iterator method continues execution right after the point at which it yield returned the previous element, rather than entering at the top of the method like an ordinary method would. In this manner, the input stream advances and returns one matching XElement after another while the LINQ query consumes that sequence and filters for UK to produce a new output sequence with the results. When the end of the input stream is reached, the reader’s Read method will return false, which will end the loop, close the reader, and finally exit the method. When the custom iterator method exits, that signals the LINQ query that there are no more input elements in the sequence and the query completes execution at that point.

One important point that may seem obvious but is worth calling out anyway, is that running streaming queries more than once results in reading through the entire stream each time. So to best apply this technique, you should try to extract everything you need from the XML content in one pass (that is, with one LINQ query). If it turns out that you need to query over the stream multiple times, you’ll need to reconsider matters to determine if you aren’t better off caching the XML content once and then querying over the in-memory cache multiple times.

Custom iterators are a very powerful C# language feature, not necessarily limited for use with LINQ. Streaming input into LINQ to XML queries as an alternative to using a cached input source is just one example, but there are numerous others ways to leverage custom iterators. To learn how they can be used with SQL Server 2008 Table-Valued Parameters to marshal an entire business object collection to a stored procedure in a single round-trip, view my earlier blog post that explains the details: SQL Server 2008 Table-Valued Parameters and C# Custom Iterators: A Match Made In Heaven!

Optimizing Factory Methods with Static Delegate Arrays

In my last post, I explained the benefits of using factory classes to achieve polymorphism in your business applications, and demonstrated how implementing such an architecture greatly improves the design and maintainability of your code. In this post (part 2 if you will) I’ll first quickly review the benefits of the factory pattern, and then demonstrate how to refactor the typical switch/case factory implementation to achieve ultra-high performance using a static array of delegates.

Here is the TenderType enum and TenderFactory code from the last post:

public enum TenderType
{
  Cash,
  DebitCard,
  CreditCard,
   //
   // more tender types
   //
}

public static class TenderFactory
{
  public static ITender CreateTender(TenderType tenderType)
  {
    switch (tenderType)
    {
      case TenderType.Cash:
        return new CashTender();

      case TenderType.DebitCard:
        return new DebitCardTender();

      case TenderType.CreditCard:
        return new CreditCardTender();
       //
       // more case statements here
       //
      default:
        throw new Exception("Unsupported tender: " + (int)tenderType);
    }
  }
}

In this example, we have a factory method that accepts a TenderType enum and uses it to create a concrete tender object corresponding to that enum. All the different tender objects implement the ITender interface, so that’s the type returned by the factory. Because the different tender behaviors are encapsulated in the set of concrete tender classes, client code can simply call this factory method to retrieve an ITender object for any tender type and work with that object without knowing the actual type. That is, you can write polymorphic code that doesn’t need to be maintained as concrete type behaviors change or new concrete types are added in the future.

It’s easy to recognize when you should be using this pattern in your own applications. When you find yourself coding the same switch/case ladder repeatedly, that’s a sure indication that your architecture can be improved significantly by using factories and polymorphism. It’s important to sensitize yourself so that you detect this condition early in your development efforts and apply the factory pattern then. It will be a much greater burden to refactor your code to use the factory pattern later on, once you have allowed a proliferation of switch/case statements to spread throughout your code.

So now we’ve got a single switch/case ladder in a centralized factory method, which eliminates the multitudes of duplicate switch/case ladders that would have otherwise been scattered throughout our application’s code. A huge improvement, to be certain, but can we improve it even more? You bet!

The Need For Speed

Because object instantiation is a very common occurrence, factory methods in general need to perform well. A performance bottleneck in the factory will quickly translate to a serious application-wide problem. If you’re dealing with a small set of concrete types (like our tenders example), and/or your factory code is running on a desktop or other single-user environment, the switch/case implementation of the factory method shown above is probably perfectly suitable and won’t require optimization.

In the real world, however, a factory method frequently supports dozens or even hundreds of different concrete types, which means coding a very long switch/case ladder in that method. Next consider the fact that the switch/case ladder is evaluated sequentially at runtime, top-to-bottom. This means that a matching case statement further down the ladder takes longer to reach than one further up the ladder. Again, for a handful of types in a single-user environment, that differential is imperceptible. But if your factory code supports many types through a long switch/case ladder, and runs on a multi-user application server that is servicing many concurrent client requests for new object instances in high demand, then it becomes critical that your factory code executes as quickly as possible.

A Golden Rule of Optimization: Let The Compiler Do The Work

The following alternative implementation is vastly superior to the switch/case version:

public static class TenderFactory
{
  private delegate ITender CreateTenderMethod();

  // Important: The order of delegates in this static array must match the
  //  order that the TenderType enums are defined.
  private static CreateTenderMethod[] CreateTenderMethods = new CreateTenderMethod[]
  {
    delegate() { return new CashTender(); },
    delegate() { return new DebitCardTender(); },
    delegate() { return new CreditCardTender(); },
     //
     // more delegates here
     //
  };

  public static ITender CreateTender(TenderType tenderType)
  {
    var tender = CreateTenderMethods[(int)tenderType].Invoke();
    return tender;
  }
}

This factory code takes a novel approach. Let’s dissect it.

At the top of the class we define a delegate named CreateTenderMethod which takes no parameters and returns an ITender object. We then declare a static array of CreateTenderMethod delegates, and populate it with anonymous methods that return for each concrete tender type. Stop and think about what this means. The static array is allocated in memory and populated with method pointers for each tender type by the compiler at compile-time. So when this assembly loads into memory, the static array just rolls right into the address space with all the elements already mapped to the methods returning the concrete types. There is no runtime hit whatsoever for dynamically allocating storage space for the array from heap and populating it with delegate instances. The work was already done by the compiler. Having the compiler do work at compile time to avoid having to do the work at runtime is one of the golden rules of optimization.

The CreateTender method itself is now ridiculously simple. It takes the incoming enum and converts it to an integer which it uses as an index into the static array. That instantaneously retrieves the correct delegate which points to the method that will return the concrete tender type specified by the enum. In an array of 250 elements, it won’t take any longer to retrieve the delegate for the 250th element than it will for the first. The Invoke method on the delegate actually runs the method and returns the correct ITender derivative to the requesting client. The only important thing to remember when using this technique, which you may have already realized on your own, is that the order of delegates in the array must match the order that the enums are defined (as mentioned in the code comments). A mismatch will obviously manifest itself as a serious runtime error.

What’s really nice here is that anonymous methods added in C# 2.0 greatly reduce the amount of code this solution requires, compared to what was required before anonymous methods. Back then, you’d need to explicitly create one-line methods for each concrete tender type, and then reference each of those one-line methods from an explicit delegate constructor in the static array. So this implementation is now not only significantly faster than the typical switch/case approach, it’s also just as easy to implement and maintain. Don’t you just love it when there are no downsides?

Leveraging Polymorphism with Factory Classes

In this post, I’ll explain the benefits of using the factory pattern, and show you how to code a typical factory implementation to achieve polymorphism in your .NET applications. In my next post, I’ll show you how to optimize the factory method for ultra-high performance by using a static array of delegates.

Working with Different Types

Let’s use a typical POS (Point Of Sale) scenario to illustrate the factory pattern. POS applications run on checkout counters in retail stores, where the customer can pay with a variety of tender types, such as cash, credit, debit, voucher, etc. Each of these types exhibit different behaviors; for example, paying with cash pops open the cash drawer, paying by debit card requires a PIN, paying by credit card requires a signature, etc.

Without using a factory pattern to manage the different tender types, code to determine if a customer signature is required on checkout for a given tender type might look like this:

bool isSignatureRequired;
switch (tenderType)
{
  case TenderType.Cash:
    isSignatureRequired = false;
    break;


  case TenderType.DebitCard:
    isSignatureRequired = false;
    break;


  case TenderType.CreditCard:
    isSignatureRequired = true;   // credit cards require signature
    break;

   //
   // more case statements here
   //

  default:
    throw new Exception("Unsupported tender: " + (int)tenderType);
}

Similarly, to determine if a PIN is required for any given tender type:

bool isPinRequired;
switch (tenderType)
{
  case TenderType.Cash:
    isPinRequired = false;
    break;


  case TenderType.DebitCard:
    isPinRequired = true;   // debit cards require PIN
    break;


  case TenderType.CreditCard:
    isPinRequired = false;
    break;

   //
   // more case statements here
   //

  default:
    throw new Exception("Unsupported tender: " + (int)tenderType);
}

With this approach, the same switch/case “ladder” will appear numerous times scattered throughout your application, each one dealing with a different aspect of each possible tender type. In short order, maintenance will become a nightmare. Changing the behavior of any type means hunting down all the pertinent switch/case blocks and modifying them. Creating a new type means adding a new case to many existing switch/case blocks—wherever they are (good luck finding them all). It’s nearly impossible to gain a clear picture of how the tender types compare to one another, because bits of information about each tender are fragmented in switch/case blocks across the application’s codebase.

Creating a Factory

The factory pattern solves this problem by eliminating all these switch/case blocks from your application, and consolidating the logic for each type in its own class instead. All the classes for each type implement the same interface, so that you can work with an instance of any type without knowing or caring what the concrete type is, and get the information you need.

With a factory pattern in place, you won’t have to search for or modify one line of code in your core application logic when the behavior of an existing tender changes or a tender type is supported in the future. This represents a huge improvement over the scattered switch/case approach.

Here are the steps to implement the pattern:

1) Create the ITender interface:

With this interface, we have defined certain things that we know about every tender type, such as IsSignatureRequired, IsPinRequired, AllowCashBack, etc.

public interface ITender
{
  bool IsSignatureRequired { get; }
  bool IsPinRequired { get; }
  bool AllowCashBack { get; }
  bool PopOpenCashDrawer { get; }
   //
   // more members
   //
}

2) Define enums for each ITender derivative: 

public enum TenderType
{
  Cash,
  DebitCard,
  CreditCard,
   //
   // more tender types
   //
}

3) Create concrete ITender classes. Here are three concrete classes that implement ITender:

public class CashTender : ITender
{
  bool ITender.IsSignatureRequired { get { return false; } }
  bool ITender.IsPinRequired       { get { return false; } }
  bool ITender.AllowCashBack       { get { return false; } }
  bool ITender.PopOpenCashDrawer   { get { return true; } }
}

public class DebitCardTender : ITender
{
  bool ITender.IsSignatureRequired { get { return false; } }
  bool ITender.IsPinRequired       { get { return true; } }
  bool ITender.AllowCashBack       { get { return true; } }
  bool ITender.PopOpenCashDrawer   { get { return false; } }
}

public class CreditCardTender : ITender
{
  bool ITender.IsSignatureRequired { get { return true; } }
  bool ITender.IsPinRequired       { get { return false; } }
  bool ITender.AllowCashBack       { get { return false; } }
  bool ITender.PopOpenCashDrawer   { get { return false; } }
}

4) Create the factory class:

public static class TenderFactory
{
  public static ITender CreateTender(TenderType tenderType)
  {
    switch (tenderType)
    {
      case TenderType.Cash:
        return new CashTender();


      case TenderType.DebitCard:
        return new DebitCardTender();


      case TenderType.CreditCard:
        return new CreditCardTender();


       //
       // more case statements here
       //

      default:
        throw new Exception("Unsupported tender: " + (int)tenderType);
    }
  }
}

With these elements in place, you can handle any ITender derivative throughout your application easily and elegantly. Given an enum for any particular tender type, a single call to the CreateTender factory method will return a new instance of the correct concrete ITender object. You can then work with the properties and methods of the returned instance and get the results expected for the specific tender type, without needing to know the specific tender type or testing for different tender types. This is the essence of polymorphism.

For example, to determine if a signature is required, it’s as simple as:

ITender tender = TenderFactory.CreateTender(tenderType);
bool isSignatureRequired = tender.IsSignatureRequired;

Unlike the code at the top of the post that retrieved this information without using the factory pattern, this code will never change, even as new tenders are added in the future. Adding support for a new tender (for example, food stamps) now involves only adding a new enum, creating a new concrete class that implements all the ITender members for the new tender type, and adding a single case statement to the factory method’s switch/case block. Doing so allows you to instantly plug in new tender types without touching one line of code throughout the body of your core application logic. The code above requires no modifications to determine if a signature is required for a newly added tender.

Enhancing and Optimizing the Factory Method

So now your application calls a method in the factory class to “new up” an ITender instance rather than explicitly instantiating a new ITender object through one of its constructors. You can enhance this pattern by encapsulating all the concrete classes in a separate assembly and scoping their constructors as internal (Friend, in VB) so that clients cannot circumvent the factory method and are instead forced to create new ITender instances by calling the factory method.

Another logical next step would be to create an abstract base class named TenderBase which would house common functionality that all tender types require. TenderBase would implement ITender, and all the concrete tender classes would inherit TenderBase instead of directly implementing ITender (though they will still implement ITender implicitly of course, since TenderBase implements ITender).

It’s important to ensure that your factory methods execute as quickly as possible, especially in scenarios where there is a high-volume demand to create new objects. In my next post, I’ll show you an improved version of the factory method that significantly out-performs the switch/case approach shown here (especially when dealing very a great many entity types), by replacing the switch/case logic with a static array of delegates.

SQL Server 2008 Table-Valued Parameters and C# Custom Iterators: A Match Made In Heaven!

The new Table-Valued Parameter (TVP) feature is perhaps the most significant T-SQL enhancement in SQL Server 2008. Many people have already discovered how to use this feature for passing entire sets of rows with a typed schema definition along from one stored procedure to another on the server. This is something that has not been previously possible using either table variables or temp tables, although temp tables can of course be shared between multiple stored procedures executing on the same connection. Conveniently, TVPs can be passed freely between stored procedures and user-defined functions (UDFs) on the server.

A less widely-known capability of TVPs—yet arguably their most compelling facet—is the ability to marshal an entire set of rows across the network, from your ADO.NET client to your SQL Server 2008 database, with a single stored procedure call (only one round-trip) that accepts a single table-valued parameter. Developers have resorted to a variety of clever hacks over the years to reduce multiple round-trips for processing multiple rows—including XML, delimited text, or even (gasp) accepting hundreds (up to 2100!) of parameters. But special logic then needs to be implemented for packaging and unpackaging the parameter values on both sides of the wire. Worse, the code to implement that logic is often gnarly, and tends to impair developer productivity. None of those techniques even come close to the elegance and simplicity of using TVPs as a native solution to this problem.

I won’t cover the basics of TVPs, since many folks know them already and there are numerous blog posts already on the Web that explain the basics (not to mention Chapter 2 in my book) What I will show is how to implement TVPs in ADO.NET client code for passing an entire DataTable object (all its rows and columns) to SQL Server with one stored procedure call.

If you’re a business-object person and not a DataSet/DataTable person, I’ll also show you how to do the same thing with an entire business object collection using C# custom iterators. There are few places (if any) other than this blog post that show the necessary steps to implement TVPs against business object collections (it’s not even covered in my book!). So let’s get started.

Preparing the SQL Server Database

Before diving into the C# code, we need to set up our database for a simple Order Entry scenario. That means creating an Order table and a related OrderDetail table. We’ll also need an OrderUdt and OrderDetailUdt user-defined table type (UDTT) to base our TVPs on, and a single InsertOrders stored procedure that accepts header and detail TVPs containing typed rows to be bulk inserted into the underlying tables:

CREATE TABLE [Order](
 OrderId int NOT NULL,
 CustomerId int NOT NULL,
 OrderedAt date NOT NULL,
 CreatedAt datetime2(0) NOT NULL,
  CONSTRAINT PK_Order PRIMARY KEY CLUSTERED (OrderId ASC))
GO

CREATE TABLE [OrderDetail](
 OrderId int NOT NULL,
 LineNumber int NOT NULL,
 ProductId int NOT NULL,
 Quantity int NOT NULL,
 Price money NOT NULL,
 CreatedAt datetime2(0) NOT NULL,
  CONSTRAINT PK_OrderDetail PRIMARY KEY CLUSTERED (OrderId ASC, LineNumber ASC))
GO

CREATE TYPE OrderUdt AS TABLE(
 OrderId int,
 CustomerId int,
 OrderedAt date)
GO

CREATE TYPE OrderDetailUdt AS TABLE(
 OrderId int,
 LineNumber int,
 ProductId int,
 Quantity int,
 Price money)
GO

CREATE PROCEDURE InsertOrders(
 @OrderHeaders AS OrderUdt READONLY,
 @OrderDetails AS OrderDetailUdt READONLY)
AS
 BEGIN

    -- Bulk insert order header rows from TVP
    INSERT INTO [Order]
     SELECT *, SYSDATETIME() FROM @OrderHeaders

    -- Bulk insert order detail rows from TVP
    INSERT INTO [OrderDetail]
     SELECT *, SYSDATETIME() FROM @OrderDetails

 END
GO

Notice how the schemas of the UDTTs and the actual tables themselves are almost, but not quite, identical. The CreatedAt column in the two tables is not present in the UDTTs, since we won’t be accepting the client’s date and time—as we shouldn’t trust it, not knowing the time zone, not being in sync with the server and other clients, etc. So we’ll accept all required column values for orders and order details from the client, except for CreatedAt, for which our stored procedure will call the SYSDATETIME function to provide the current date and time based on the server clock (just like the GETDATE function, but returns as a datetime2 data type rather than a datetime data type). Independent of CreatedAt, the header table has an OrderedAt column which will be accepted from the client as a date value (the “time-less” date and enhanced datetime2 are new data types in SQL Server 2008 that I’ll cover in a future post).

Passing a DataTable to SQL Server

The easiest way to call this stored procedure from a .NET client is to use a DataTable. You simply pass the entire populated DataTable object as the parameter value and set the parameter’s SqlDbType property to SqlDbType.Structured. Everything else is plain old vanilla ADO.NET:

var headers = new DataTable();
headers.Columns.Add("OrderId", typeof(int));
headers.Columns.Add("CustomerId", typeof(int));
headers.Columns.Add("OrderedAt", typeof(DateTime));

var details = new DataTable();
details.Columns.Add("OrderId", typeof(int));
details.Columns.Add("LineNumber", typeof(int));
details.Columns.Add("ProductId", typeof(int));
details.Columns.Add("Quantity", typeof(decimal));
details.Columns.Add("Price", typeof(int));

headers.Rows.Add(new object[] { 6, 51, DateTime.Today });
details.Rows.Add(new object[] { 6, 1, 12, 2, 15.95m });
details.Rows.Add(new object[] { 6, 2, 57, 1, 59.99m });
details.Rows.Add(new object[] { 6, 3, 36, 10, 8.50m });

headers.Rows.Add(new object[] { 7, 51, DateTime.Today });
details.Rows.Add(new object[] { 7, 1, 23, 2, 79.50m });
details.Rows.Add(new object[] { 7, 2, 78, 1, 3.25m });

using (var conn = new SqlConnection("Data Source=.;Initial Catalog=MyDb;Integrated Security=True;"))
{
  conn.Open();
  using (var cmd = new SqlCommand("InsertOrders", conn))
  {
    cmd.CommandType = CommandType.StoredProcedure;

    var headersParam = cmd.Parameters.AddWithValue("@OrderHeaders", headers);
    var detailsParam = cmd.Parameters.AddWithValue("@OrderDetails", details);

    headersParam.SqlDbType = SqlDbType.Structured;
    detailsParam.SqlDbType = SqlDbType.Structured;

    cmd.ExecuteNonQuery();
  }
  conn.Close();
}

The magic here is that the one line of code that invokes ExecuteNonQuery sends all the data for two orders with five details from the client over to the InsertOrders stored procedure on the server. The data is structured as a strongly-typed set of rows on the client, and winds up as a strongly-typed set of rows on the server in a single round-trip, without any extra work on our part.

This example passes a generic DataTable object to a TVP, but you can also pass a strongly typed DataTable or even a DbDataReader object (that is connected to another data source) to a TVP in exactly the same way. The only requirement is that the columns of the DataTable or DbDataReader correspond to the columns defined for the UDTT that the TVP is declared as.

So TVPs totally rock, eh? But hold on, the TVP story gets even better…

Passing a Collection of Objects to SQL Server

What if you’re working with collections populated with business objects rather than DataTable objects populated with DataRow objects? You might not think at first that business objects and TVPs could work together, but the fact is that they can… and quite gracefully too. All you need to do is implement the IEnumerable<SqlDataRecord> interface in your collection class. This interface requires your collection class to supply a C# custom iterator (sorry, VB .NET doesn’t support custom iterators) method named GetEnumerator which ADO.NET will call for each object contained in the collection when you invoke ExecuteNonQuery. Here are the detailed steps:

First we’ll define the OrderHeader and OrderDetail classes and properties, similar to the way we created the DataTable objects and columns before:

public class OrderHeader
{
  public int OrderId { get; set; }
  public int CustomerId { get; set; }
  public DateTime OrderedAt { get; set; }
}

public class OrderDetail
{
  public int OrderId { get; set; }
  public int LineNumber { get; set; }
  public int ProductId { get; set; }
  public int Quantity { get; set; }
  public decimal Price { get; set; }
}

Ordinarily, List<OrderHeader> and List<OrderDetail> objects might be suitable for serving as collections of OrderHeader and OrderDetail objects in our application. But these collections they won’t suffice on their own as input values for TVPs because List<T> doesn’t implement IEnumerable<SqlDataRecord>. We need to add that ourselves. So we’ll define OrderHeaderCollection and OrderDetailCollection classes that inherit List<OrderHeader> and List<OrderDetail> respectively, and also implement IEnumerable<SqlDataRecord> to “TVP-enable” them:

public class OrderHeaderCollection : List<OrderHeader>, IEnumerable<SqlDataRecord>
{
  IEnumerator<SqlDataRecord> IEnumerable<SqlDataRecord>.GetEnumerator()
  {
    var sdr = new SqlDataRecord(
     new SqlMetaData("OrderId", SqlDbType.Int),
     new SqlMetaData("CustomerId", SqlDbType.Int),
     new SqlMetaData("OrderedAt", SqlDbType.Date));

    foreach (OrderHeader oh in this)
    {
      sdr.SetInt32(0, oh.OrderId);
      sdr.SetInt32(1, oh.CustomerId);
      sdr.SetDateTime(2, oh.OrderedAt);

      yield return sdr;
    }
  }
}

public class OrderDetailCollection : List<OrderDetail>, IEnumerable<SqlDataRecord>
{
  IEnumerator<SqlDataRecord> IEnumerable<SqlDataRecord>.GetEnumerator()
  {
    var sdr = new SqlDataRecord(
     new SqlMetaData("OrderId", SqlDbType.Int),
     new SqlMetaData("LineNumber", SqlDbType.Int),
     new SqlMetaData("ProductId", SqlDbType.Int),
     new SqlMetaData("Quantity", SqlDbType.Int),
     new SqlMetaData("Price", SqlDbType.Money));

    foreach (OrderDetail od in this)
    {
      sdr.SetInt32(0, od.OrderId);
      sdr.SetInt32(1, od.LineNumber);
      sdr.SetInt32(2, od.ProductId);
      sdr.SetInt32(3, od.Quantity);
      sdr.SetDecimal(4, od.Price);

      yield return sdr;
    }
  }
}

I’ll only explain the OrderHeaderCollection class; you can then infer how the OrderDetailCollection class–or any of your own collection classes–implements the custom iterator needed to support TVPs.

First, again, it inherits List<OrderHeader>, so an OrderHeaderCollection object is everything that a List<OrderHeader> object is. This means implicitly, by the way, that it also implements IEnumerable<OrderHeader>, which is what makes any sequence “foreach-able” or “LINQ-able”. But in addition, it explicitly implements IEnumerable<SqlDataRecord> which means it also has a customer iterator method for ADO.NET to consume when an instance of this collection class is assigned to a SqlDbType.Structured parameter for piping over to SQL Server with a TVP.

Every enumerable class requires a matching enumerator method, so not surprisingly implementing IEnumerable<SqlDataRecord> requires providing a GetEnumerator method that returns an IEnumerator<SqlDataRecord>. This method first initializes a new SqlDataRecord object with a schema that matches the UDTTs that the TVPs are declared as. It then enters a loop that iterates all the elements in the collection (possible because List<OrderHeader> implicitly implements IEnumerable<OrderHeader>). On the first iteration, it sets the column property values of the SqlDataRecord object to the property values of the first OrderHeader element, and then issues the magic yield return statement. By definition, any method (like this one) which returns IEnumerator<T> and has a yield return statement in it, is a custom iterator method that is expected to return a sequence of T objects until the method execution path completes (in this case, when the foreach loop finishes).

The crux of this is that we are never calling this method directly. Instead, when we invoke ExecuteNonQuery to run a stored procedure with a SqlDbType.Structured parameter (that is, a TVP), ADO.NET expects the collection passed for the parameter value to implement IEnumerable<SqlDataRecord> so that IEnumerable<SqlDataRecord>.GetEnumerator can be called internally to fetch each new record for piping over to the server. When the first element is fetched from the collection, GetEnumerator is entered, the SqlDataRecord is initialized and is then populated with values using the SetInt32 and SetDateTime methods (there’s a SetXXX method for each data type). That SqlDataRecord “row” is then pushed into the pipeline to the server by yield return. When the next element is fetched from the collection, the GetEnumerator method resumes from the point that it yield returned the previous element, rather than entering GetEnumerator again from the top. This means the SqlDataRecord gets initialized with schema information only once, while its population with one element after another is orchestrated by the controlling ADO.NET code for ExecuteNonQuery that actually ships one SqlDataRecord after another to the server..

The actual ADO.NET code is 100% identical to the code at the top of the post that works with DataTable objects. Substituting a collection for a DataTable object requires no code changes and works flawlessly, provided the collection implements IEnumerator<SqlDataRecord> and provides a GetEnumerator method that maps each object instance to a SqlDataRecord and each object property to a column defined by the UDTT that the TVP is declared as.

How freaking c-o-o-l is that?

Understanding Generic Delegates and Lambda Expressions with LINQ Extension Methods

Generic delegates and lambda expressions underpin LINQ. If you’ve been struggling to get your arms around these new concepts, this post should help demystify matters for you.

Of course, there are a host of other new .NET language features that enable LINQ as well. These include extension methods, type inference, anonymous types, object initializers, and custom iterators, just to name a few! In this post, I’ll address generic delegates and lambda expressions as they pertain to LINQ (that is, as they pertain to the extension methods provided in .NET 3.5 to support LINQ), and I’ll cover some of the other new language features in future posts.

LINQ Query Syntax

There is simply no LINQ without generic delegates and lambda expressions, yet it would seem at first that you can write simple LINQ queries without seeing or touching either a generic delegate or a lambda expression. Here’s a simple example that queries from a collection of customers in custs. In the where clause, we use a boolean allCustomers variable to control whether all customers or only USA customers will be returned.

var allCustomers = false;

var q =
from c in custs
where c.Country == "USA" || allCustomers == true
orderby c.FirstName
select c;

Where’s the generic delegate and the lambda expression in this LINQ query? Oh they’re there all right, but the LINQ query syntax hides those details from the code.

LINQ Method Call Syntax

In reality, the compiler (C# in this case, but VB .NET as well) is supporting a sugar-coated syntax that is strikingly similar to an ordinary SQL query in the RDBMS world, but which really amounts to a chained set of method calls.

What this means is that there is really no such thing as a where clause in LINQ–it’s a Where extension method call.

And guess what? There’s no such thing as an orderby clause either–it’s an OrderBy extension method call, chained to the end of the Where method call.

Meaning that the previous query is treated by the compiler exactly as if you had coded it using LINQ method call syntax as follows:

var q = custs
.Where(c => c.Country == "USA" || allCustomers == true)
.OrderBy(c => c.FirstName);

So this code looks more “conventional,” eh? We’re invoking the Where method on the collection of customers, and then invoking the OrderBy method on the result returned by Where, and obtaining our final filtered and sorted results in q.

LINQ query syntax is obviously cleaner than LINQ method call syntax, but it’s important to understand that only LINQ method call syntax fully implements the complete set of LINQ Standard Query Operators (SQOs), so that you will sometimes be forced to use method call syntax. Also realize that it’s common and normal to use combinations of the two in a single query (I’ll show examples of this using .Concat and other methods in future posts).

Two questions arise at this point:

1) How did the customer collection, which is a List<Customer> in our example, get a Where and OrderBy method?

2) What is the => operator in the method parameters, and what does it do (and how the heck do you pronounce it)?

The answer to the first question is simple: Extension methods. I’ll cover those in more detail in a future post, but for now just understand that the .NET framework 3.5 “extends” any class that implements IEnumerable<T> (virtually any collection, such as our List<Customer>) with the Where and OrderBy methods, as well as many other methods that enable LINQ.

Generic Delegates

The second question is much juicier. First, the => is the lambda operator, and many people verbalize it as ‘goes to’. So ‘c => …’ is read aloud as ‘c goes to…’

To understand lambdas, you need to understand generic delegates and anonymous methods. And to best understand those is to review delegates and generics themselves. If you’ve been programming .NET for any length of time, you’re certain to have encountered delegates. And as of .NET 2.0 (VS 2005), generics have been an important language feature for OOP as well.

A delegate is a data type that defines the “shape” (signature) of a method call. Once you have defined a delegate, you can declare a variable as that delegate. At runtime, the variable can then be pointed to (and then invoke) any available method that matches the signature defined by the delegate. EventHandler is a typical example of a delegate defined by the .NET framework. Here is how .NET defines the EventHandler delegate:

public delegate void EventHandler(object sender, EventArgs e);

This delegate is suitable for pointing to any method that returns no value (void), and accepts two parameters (object sender and EventArgs e); in other words, a typical event handler that has no special event arguments to be passed.

A generic delegate is nothing more than an ordinary delegate, but supports generic type declarations for achieving strongly-typed definitions of the delegate at compile time. This is the same way generics are used for classes. For example, just as the framework provides a List<T> class so that you can have a List of anything, it now also provides a small set of generic delegates named Func that can be used to stongly type many different method signatures; essentially, any method that takes from zero to 4 parameters of any type (T1 through T4) and returns an object of any type (TResult):

public delegate TResult Func<TResult>();
public delegate TResult Func<T, TResult>(T arg);
public delegate TResult Func<T1, T2, TResult>(T1 arg1, T2 arg2);
public delegate TResult Func<T1, T2, T3, TResult>(T1 arg1, T2 arg2, T3 arg3);
public delegate TResult Func<T1, T2, T3, T4, TResult>(T1 arg1, T2 arg2, T3 arg3, T4 arg4);

Delegates as Parameters

With these generic Func delegates baked into the .NET framework, LINQ queries can use them as parameters to extension methods such as Where and OrderBy. Let’s examine one overload of the Where extension method:

public static IEnumerable<TSource> Where<TSource>(
  this IEnumerable<TSource> source,
  Func<TSource, bool> predicate);

What does this mean? Let’s break it down. And to better suit our particular example and simplify the explanation, let’s substitute Customer for TSource:

1) It’s a method called Where<Customer>.

2) It returns an IEnumerable<Customer> sequence.

3) It’s an extension method available to operate on any IEnumerable<Customer>instance (as denoted by the special this keyword in the signature), such as the List<Customer> in our example. The this keyword used in this context means that an extension method for the type following the this keyword is being defined, rather than definining what looks like the first parameter to the method.

4) It takes a single predicate parameter of type Func<Customer, bool>.

Number 4 is key. Func<Customer, bool> is the second Func generic delegate shown above, Func<T, TResult>, where T is Customer and TResult is bool (Boolean). The Func<Customer, bool> delegate refers to any method that accepts a Customer object and returns a bool result.  This means that the Where<Customer> method takes a delegate (function pointer) that points to another method which actually implements the filtering logic. This other method will receive each Customer object as the LINQ query iterates the source sequence List<Customer>, apply the desired filtering criteria on it, and return a true or false result in the bool return value that controls whether this particular Customer object should be selected by the query.

How does all that get expressed simply with: .Where(c => c.Country == “USA” || allCustomers == true)?

The best way to answer that question is to first demonstrate two other ways of invoking the Where method. The first is to explicitly provide a delegate instance of Func<Customer, bool> that points to another method called IsCustomerCountryUSA:

private void MainMethod()
{
    var filterMethod = new Func<Customer, bool>(IsCustomerCountryUSA);
    var q = custs.Where(filterMethod);
}

private bool IsCustomerCountryUSA(Customer c)
{
return (c.Country == "USA");
}

The Where method takes a single parameter, which is a new instance of the Func<TSource, T> generic delegate that points to a method named IsCustomerCountryUSA. This works because IsCustomerCountryUSA accepts a Customer object and returns a bool result, which is the signature defined by Func<Customer, bool> expected as a parameter by Where<Customer>.  Inside the IsCustomerCountryUSA, filtering logic is applied. Note that because we have created IsCustomerCountryUSA as a completely separate method, it cannot access the allCustomers variable defined locally in the calling method (the variable that controls whether all or only USA customers should be selected, as shown at the beginning of this post). Furthermore, because the signature of the IsCustomerCountryUSA method is fixed as determined by Func<Customer, bool>, we can’t just pass the allCustomers variable along as another parameter either.  Thus, this approach would require you to define the allCustomers variable as a private member if you wanted to include it in the filtering logic of the IsCustomerCountryUSA method.

Anonymous Methods

Anonymous methods were introduced at the same time as generics in .NET 2.0 (VS 2005). They are useful in cases such as this, where the method we’re creating for the delegate only needs to be called from one place, so the entire method body is simply coded in-line. Since it’s coded in-line at the one and only place it’s invoked, the method doesn’t need to be given a name, hence the term anonymous method.

To refactor the delegate instance version of the code to use an anonymous method, replace the Func parameter passed to the Where method with the actual method signature and body as follows:

var q = custs.Where(delegate(Customer c) { return c.Country == "USA" || allCustomers == true; });

The keyword delegate indicates you’re pointing the delegate parameter Func<Customer, bool> expected by the Where method to an anonymous method. The anonymous method isn’t named, as mentioned, but it must match the Func<Customer, bool> signature, meaning it must accept a Customer object parameter and return a bool result. The explicit signature following the delegate keyword defines the expected single Customer object parameter, and the expected bool result is inferred from the fact that the method returns an expression that resolves to a bool value. If the anonymous method used any other signature or returned anything other than a bool result, the compiler would fail to build your code  because the Where<Customer> method is defined to accept only a Func<Customer, bool> delegate.

So now the method body containing the filtering logic that was formerly coded in the separate IsCustomerCountryUSA method appears in braces right after the anonymous method signature. But there’s another huge advantage here besides saving the overhead of declaring a separate method for the filtering logic. Because anonymous methods appear inside of the method that calls them, they magically gain access to the local variables of the calling method. Thus, this implementation can also test the allCustomers variable, even though it’s defined in the calling method which would normally be inaccessible to the method being called. Because of this feature, the allCustomers variable defined as a local variable in the calling method can be tested by the anonymous method as part of the filtering criteria. That means it’s not necessary to define private fields to share local variables defined in a method with an anonymous method that the method calls, as demonstrated here). The result is less code, and cleaner code (though you should be aware that the compiler pulls some fancy tricks to make this possible, and there is a degree of additional runtime overhead involved when accessing local variables of the calling method from the anonymous method being called).

Lambda Expressions

Lambdas were introduced in .NET 3.5 (VS 2008) as a more elegant way of implementing anonymous methods, particularly with LINQ. When a lambda is used to represent an anonymous method containing a single expression which returns a single result, that anonymous method is called a lambda expression. Let’s first refactor our query by converting it from an anonymous method to a lambda:

var q = custs.Where((Customer c) => { return c.Country == "USA" || allCustomers == true; });

All we did was drop the delegate keyword and add the => (“goes to”) operator. The rest looks and works the same as before.

This lambda consists of a single expression that returns a single result, which qualifies it to be refactored as a lambda expression like this:

var q = custs.Where(c => c.Country == "USA" || allCustomers == true);

This final version again works exactly the same way, but has become very terse. First, the Customer data type in the signature isn’t needed, since it’s inferred as the TSource in Func<TSource, bool>. The parentheses surrounding the signature are also not needed, so they’re dropped too. Secondly, since lambda expressions by definition consist of only a single statement, the opening and closing braces for the method body aren’t needed and also get dropped. Finally, since lambda expressions by definition return a single result, the return keyword isn’t needed and is also dropped.

Besides offering an even terser syntax, lambda expressions provide two important additional benefits over anonymous methods. First, as just demonstrated, the data type of the parameter in the signature needn’t be specified since it’s inferred automatically by the compiler. That means that anonymous types (such as those often created by projections defined in the select clause of a LINQ query) can also be passed to lambda expressions. Anonymous types cannot be passed to any other type of method, so this capability was absolutely required to support anonymous types with LINQ. Second, lambda expressions can be used to build expression trees at runtime, which essentially means that your query code gets converted into query data which can then be processed by a particular LINQ implementation at runtime. For example, LINQ to SQL builds an expression tree from the lambda expressions in your query, and generates the appropriate T-SQL for querying SQL Server from the expression tree. Future posts will cover anonymous types and expression trees in greater detail.

To Sum It All Up

So this final lambda expression version invokes the Where method on the List<Customer> collection in custs. Because List<Customer> implements IEnumerable<T>, the Where extension method which is defined for any IEnumerable<T> is available to be invoked on it. The Where<Customer> method expects a single parameter of the generic delegate type Func<Customer, bool>. Such a parameter can be satisfied by the in-line lambda expression shown above, which accepts a Customer object as c =>, and returns a bool result by applying a filtering condition on the customer’s Country property and the calling method’s allCustomers variable.

And as expained at the beginning of this post, the method syntax query using our final lambda expression is exactly what the compiler would generate for us if we expressed it using this query syntax:

var q =
    from c in custs
    where c.Country == "USA" || allCustomers == true
    select c;

I hope this helps clear up any confusion there is surrounding lambda expressions in .NET 3.5.

Happy coding!