SQL Server 2008 introduced FILESTREAM, an innovative feature that integrates the relational database engine with the NTFS file system to provide highly efficient storage and management of BLOBs. Now FileTable, a new feature coming in the next version of SQL Server (code-named “Denali”), builds on FILESTREAM and takes database BLOB management to the next level. In this post, I’ll explain FileTable and show you how it works. I also encourage you to experiment with FileTable yourself by downloading and installing SQL Server Denali CTP3 from http://bit.ly/DenaliCTP3 (but be sure to install it on a virtual machine; beta software can be difficult or impossible to uninstall).
Note: FileTable relies on FILESTREAM, so you need to understand FILESTREAM to fully understand FileTable. If you’re new to FILESTREAM, you can get the necessary background by reading my three earlier blog posts Introducing FILESTREAM, Enabling and Using FILESTREAM, and Using SqlFileStream.
FileTable combines FILESTREAM with hierarchyid (another SQL Server 2008 feature) and the Windows file system API to deliver new and exciting BLOB capabilities in SQL Server. (I just love seeing reusability leveraged like this.) Like the two words joined in its name, one FileTable functions as two distinct things simultaneously:
- A FileTable is an Ordinary Table
- A FileTable is an Ordinary File System
First and foremost, a FileTable is a regular SQL Server database table in every respect, with one exception: The schema of a FileTable is fixed. The columns of a FileTable and their data types are pre-determined by SQL Server. Specifically, every FileTable contains these columns:
||Unique row identifier
||BLOB content (NULL if directory)
||Name of file or directory
||Location of file or directory within the logical file system
Every FileTable implements a logical folder structure using the path_locator column. This is a hierarchyid value that denotes the location of each file and folder (row) within the logical file system (table). The hierarchyid data type was introduced in SQL Server 2008 as a binary value that, relative to other hierarchyid values in the same tree structure, identifies a unique node position (reminder to self: blog on hierarchyid!). It is implemented as a system CLR type, which means that it’s a .NET framework class, and has a set of methods you can use (e.g., GetAncestor, GetDescendant, GetReparentedValue, IsDescendantOf, etc.) to traverse and manipulate the hierarchy. Thus, it’s perfect for casting the hierarchical structure of a file system over a relational table, as FileTable does.
The path_locator column is defined as the table’s primary key with a non-clustered index. A separate key value is also stored in the stream_id column with a non-clustered unique index. This is the uniqueidentifier ROWGUIDCOL value required by any table with varbinary(max) FILESTREAM columns, so FileTable is no exception. And unlike path_locator, this unique value will never change once it is assigned to a new FileTable row, even if the row is later “reparented” (i.e., moved to another location in the hierarchy). Thus, you should treat stream_id as each row’s “ID,” even though it isn’t the table’s primary key.
Every row in a FileTable corresponds precisely to either a single file or folder, as determined by the bit value in the is_directory column. The file or folder name is stored in the name column as an nvarchar(255) string. All of the other column names are self-describing, and are used to store typical file system information such as various timestamps and storage attributes. The actual file content (the BLOB itself) is stored in the file_stream column, which is a varbinary(max) data type decorated with the FILESTREAM attribute. This means that the binary content in the file_stream column is stored in the NTFS file system that SQL Server is managing behind the scenes, rather than the structured file groups where all the other table data is stored (standard FILESTREAM behavior in SQL Server 2008).
In addition to these fourteen columns, each FileTable includes the following three computed (read-only) columns:
||Parent node derived from path_locator
||Extension derived from name
||BLOB byte length derived from file_stream
The parent_path_locator column returns the result of calling GetAncestor(1) on path_locator to obtain the path_locator to the parent folder of any file or folder in the table. The file_type column returns the extension of the filename parsed from the string value in the name column. And the cached_file_size column returns the number of bytes stored in the file_stream column (these are BLOBs stored in SQL Server’s internally-managed NTFS file system behind the scenes).
With this fixed schema in place, every FileTable has what it needs to represent a logical file system. Thus, SQL Server is able to fabricate a Windows file share over any FileTable. This magically exposes the FileTable to any user or application who can then view and update the table using standard file I/O semantics (e.g., drag-and-drop with Windows Explorer, or read/write with System.IO.FileStream). So:
- Creating a file or directory in the logical file system adds a row to the table
- Adding a row to the table creates a file or directory in the logical file system
Here’s the total FileTable picture:
Take a moment to digest what’s happening here. In addition to the programmatic FILESTREAM access using either T-SQL or SqlFileStream, Denali now offers a third interface to FILESTREAM: A logical file system. In a sense, this fills the SQL Server 2008 FILESTREAM gap in which the file system itself is completely inaccessible. This is not to say that FileTable lets you directly access SQL Server’s internally managed NTFS file system; certainly not. That remains obfuscated and private, as it continues to function in standard FILESTREAM fashion against BLOB data that just happens to be in a FileTable instead of a regular table. What we do get is an abstraction layer over the FileTable that functions as a standard file system. In this logical file system, everything about each file and folder—except the BLOB content of the files themselves—is stored in the FileTable’s structured file group, while the BLOBs themselves are physically stored in the NTFS file system.
So you can see that there’s really nothing new beneath the FileTable layer; as before, SQL Server synchronizes transactional access between the row in the FileTable and its corresponding BLOB content in the NTFS file system to ensure that integrity is properly maintained between them. As with T-SQL access, this synchronization occurs implicitly when manipulating the FileTable via the exposed Windows file share. And, being an ordinary table in virtually every respect, you also have the option to use SqlFileStream with explicit transaction synchronization for the fastest possible streaming of BLOBs into and out of a FileTable.
I don’t know about you, but I find all of this extremely appealing. We now have total flexibility for BLOB storage in the database. With FileTable, you can easily migrate existing applications that work against physical file systems without writing any custom T-SQL or fancy SqlFileStream code. Just use a FileTable, let the existing applications continue working without modification, and enjoy the benefits of your files becoming an integral part of the SQL Server database.
That’s the whole FileTable story. The rest of this post just walks you through the steps and syntax for getting a FileTable up and running. It’s quite simple and straightforward. Specifically, you’ll need to:
- Enable FILESTREAM for file system access
- Create a FileTable-enabled database
- Create a FileTable
Enable FILESTREAM for File System Access
FILESTREAM must be enabled for file system access at both the service level (either during setup, or later via the SQL Server Configuration Manager) and at the instance level. Complete details can be found in my Enabling and Using FILESTREAM post. Once enabled, SQL Server exposes a file share for all the FileTable-enabled databases you create under the SQL Server instance (by default, the share name is MSSQLSERVER).
The following statement enables FILESTREAM at the instance level for file system access (level 2).
EXEC sp_configure filestream_access_level, 2
Create a FileTable-Enabled Database
Naturally, a FileTable-enabled database must include the FILEGROUP…CONTAINS FILESTREAM clause expected of any FILESTREAM-enabled database. In addition, you must also specify two parameters in the SET FILESTREAM clause of the the CREATE DATABASE (or ALTER DATABASE) statement. The DIRECTORY_NAME specifies the name of the folder that will be created for this database in the root file share associated with the instance. And enabling full non-transacted access with NON_TRANSACTED_ACCESS=FULL exposes every FileTable within the database as a subfolder beneath the database folder of the instance’s file share.
CREATE DATABASE PhotoLibrary
(NAME = PhotoLibrary_data,
FILENAME = 'C:\DB\PhotoLibrary_data.mdf'),
FILEGROUP FileStreamGroup CONTAINS FILESTREAM
(NAME = PhotoLibrary_blobs,
FILENAME = 'C:\DB\Photos')
(NAME = PhotoLibrary_log,
FILENAME = 'C:\DB\PhotoLibrary_log.ldf')
Create a FileTable
Not surprisingly, this is the easiest part. Since SQL Server controls the schema of every FileTable, you just use a CREATE TABLE statement with the new AS FileTable clause and don’t include any columns:
CREATE TABLE PhotoFiles AS FileTable
Your FileTable is ready to use. You will find a root PhotoFiles folder for the FileTable beneath the PhotoLibrary folder created for the database in the Windows file share for the instance. You can interact with the FileTable using T-SQL, SqlFileStream, or the logical file system exposed by the Windows file share.
FILESTREAM + hierarchyid + Windows File Share = FileTable