This is the second installment in a 3-post series covering the new FILESTREAM feature in SQL Server 2008. In part 1, I explained how FILESTREAM works at a high level, and in this post I’ll show you the step-by-step procedure for enabling and using this awesome new feature.
You need to enable FILESTREAM before you can use it, and that’s true of most “extra” features—which are disabled until you need them, as part of the overall “secure by default” strategy in SQL Server. Enabling FILESTREAM is slightly more involved than enabling other features because it involves two steps:
Enabling FILESTREAM for the Windows Service
First you need to enable FILESTREAM at the Windows service level. You can perform this step during the SQL Server 2008 setup process, or any time after SQL Server 2008 is installed by using the SQL Server Configuration Manager (click the Start menu and choose All Programs > Microsoft SQL Server 2008 > Configuration Tools > SQL Server Configuration Manager). The Properties dialog box for the MSSQLSERVER process has a FILESTREAM tab that lets you choose between four FILESTREAM access levels.
1) Disabled: With all three checkboxes cleared, FILESTREAM is completely disabled. This is the default setting.
2) Enabled for T-SQL only: Checking the first checkbox enables FILESTREAM in “total transparency” mode. This means that you’ll be able to treat varbinary(max) columns decorated with the FILESTREAM attribute as conventional varbinary(max) columns that appear to be “in-line” columns; either by in-lining binary streams using T-SQL, or by embedding and extracting BLOBs as byte arrays to and from your tables.
3) Enabled for T-SQL and file streaming: Checking the second checkbox also enables direct streaming between the file system and your application running on the same machine as SQL Server. With this option selected, a share name is also specified to enable direct streaming using the OpenSqlFilestream function exposed by the SQL Server 2008 Native Client API, which I’ll cover in my third and final FILESTREAM post.
4) Enabled for T-SQL and file streaming to remote clients: This is the same as the previous option, but allows applications running on remote machines to use OpenSqlFilestream. Since applications rarely run on the same box as SQL Server, you’ll typically always choose this option if you’ll be using OpenSqlFilestream (which, again, I’ll cover in my next post).
Enabling FILESTREAM for the SQL Server instance
Next you need to enable FILESTREAM at the SQL Server instance level. Basically, there is a separation of security concerns here between the Windows and database administrators, and the same access level set for the Windows service needs to be set for the SQL Server instance. That’s easy if one person wears both hats, but otherwise, these guys need to get along and they need to agree.
You can use SQL Server Management Studio (SSMS) perform this step from the Advanced page in the Server Properties dialog:
Although there are only three options in this dropdown, they are essentially the same options I described for the Windows service configuration; where “Full access enabled” means direct streamed access to the file system from local clients only, or both local and remote clients, depending on the corresponding setting made at the Windows service level.
As an alternative to using SSMS, you can programmatically set this access level by using the following T-SQL statements (specify n as 0, 1, or 2 for disabled, T-SQL access enabled, full access enabled, respectively):
EXEC sp_configure filestream_access_level, n RECONFIGURE
Creating a FILESTREAM-Enabled Database
Once you’ve enabled FILESTREAM, you can create a FILESTREAM-enabled database by using the new FILEGROUP…CONTAINS FILESTREAM clause, such as the one in this otherwise ordinary CREATE DATABASE statement:
CREATE DATABASE PhotoLibrary ON PRIMARY (NAME = PhotoLibrary_data, FILENAME = 'C:\DB\PhotoLibrary_data.mdf'), FILEGROUP FileStreamGroup CONTAINS FILESTREAM (NAME = PhotoLibrary_blobs, FILENAME = 'C:\DB\Photos') LOG ON (NAME = PhotoLibrary_log, FILENAME = 'C:\DB\PhotoLibrary_log.ldf')
When you execute this statement, SQL Server will create the PhotoLibrary_data.mdf and PhotoLibrary_log.ldf files as usual, and will also create a Photos folder inside of which it will transparently store individual files for each BLOB stored in varbinary(max) FILESTREAM columns in the PhotoLibrary database.
Creating and Using FILESTREAM Columns
As I already mentioned, you only need to append the FILESTREAM attribute to the varbinary(max) data type on the column(s) you want SQL Server to store in the file system. But as it turns out, there’s an additional requirement to include a single column of type uniqueidentifier (GUID) decorated with the ROWGUIDCOL attribute. This column must not allow nulls, and needs to either be the primary key or have a unique constraint enforced on it. SQL Server absolutely requires this ROWGUIDCOL column in order to transparently link BLOBs in the file system with their corresponding rows in the database tables.
Only one ROWGUIDCOL-attributed column is permitted in any table, but once that column is defined, you can then define any number of varbinary(max) FILESTREAM columns for that table that you’d like. Furthermore, by establishing a default value for that column to automatically generate GUID values using either the GETID or GETSEQUENTIALID functions, you can satisfy the requirement to include this column without incurring any additional burden of maintaining it.
The following statement creates a table in the PhotoLibrary database that uses FILESTREAM to store BLOBs of images in a varbinary(max) FILESTREAM column named Photo:
CREATE TABLE PhotoAlbum( PhotoId int PRIMARY KEY, RowId uniqueidentifier ROWGUIDCOL NOT NULL UNIQUE DEFAULT NEWSEQUENTIALID(), Description varchar(max), Photo varbinary(max) FILESTREAM)
At this point, you can treat the Photo column as an ordinary varbinary(max) column for BLOB storage. This means you can embed or extract byte arrays directly to and from this column from your client application exactly as you did before. Or, for BLOBs that aren’t too large, you can even express the binary stream in T-SQL as follows:
INSERT INTO PhotoAlbum(PhotoId, Description, Photo) VALUES(2, 'Document icon', 0x4749463839610C000E00B30000FFFFFFC6DEC6C0C0C000008 0000000D3121200000000000000000000000000000000000000000000000000000000000021F9 0401000002002C000000000C000E0000042C90C8398525206B202F1820C80584806D1975A29AF 48530870D2CEDC2B1CBB6332EDE35D9CB27DCA554484204003B)
This statement appears to insert a BLOB directly into the table’s Photo column, in-line with the other columns (PhotoId and Description). And it certainly appears the same way when retrieving the row back out of the table:
SELECT PhotoId, Description, Photo FROM PhotoAlbum WHERE PhotoId = 2
Peeking Behind the Scenes
As is often the case with powerful abstraction layers, the result is somewhat anti-climactic, since everything appears to work the same as it did in earlier versions of SQL Server using the varbinary(max) data type. However, if we start exploring under the covers, we can see the effect that FILESTREAM has on the file system behind the scenes.
But first, understand that normally “thou shalt not” do what we’re about to do; that is, interact directly with the file system. The manner in which SQL Server decides to create and organize subdirectories and files in the folder designated as the FILESTREAM file group is none of our business, just as the internal structure and content of .mdf and .ldf files are none of our business—it’s all managed exclusively by SQL Server. Accepting that, we can still dig into the file system to prove that things are working behind the scenes as expected.
Using Windows Explorer, navigate to the C:\DB\Photos folder created by SQL Server for FILESTREAM storage. The directory names and structure are certainly not intuitive, nor will they necessarily match the ones on my system. But if you start peeking around, you’ll encounter a file whose date/time coincides with the date/time that we inserted the row:
Knowing that the BLOB we inserted is an image, right-click on the file, choose Open, and then select Windows Picture and Fax Viewer to display the image (which I’ve magnified in the following screenshot):
If you really want to prove the point (and let me warn you again, this is a real no-no), you can even replace this file with some other image and verify that the SELECT statement returns a different BLOB. It should also be fairly obvious that you’d want to lock down the folder designated for FILESTREAM storage just as you lock down the .mdf and .ldf files, so that only SQL Server itself has direct access to the file system.
As explained in part 1, FILESTREAM delivers full transactional integrity over BLOB data in the file system. If you wrap the above INSERT statement inside a transaction, then SQL Server will automatically initiate an NTFS file system transaction over the file added to the file system behind the scenes. In this case, rolling back the database transaction (or any other condition in which the database transaction does not commit successfully) automatically rolls back the NTFS file system transaction so that both the row and the file disappear.
Direct File System Access
Hopefully, you’re reasonably impressed with FILESTREAM thus far. But there’s a much better way to stream content into and out of varbinary(max) FILESTREAM columns. The fact remains that the varbinary(max) data type is not optimized for streaming, even though FILESTREAM uses the file system instead of your relational tables to persist BLOBs. How are you going to achieve the maximum BLOB access performance? By streaming directly into and out of the file system, which is a native environment optimized for streaming. You can do that by using the OpenSqlFilestream function exposed by the SQL Server Native Client API, which I’ll cover in the third and last post. So stay tuned!