It’s a File System… It’s a Database Table… It’s SQL Server Denali FileTable!

SQL Server 2008 introduced FILESTREAM, an innovative feature that integrates the relational database engine with the NTFS file system to provide highly efficient storage and management of BLOBs. Now FileTable, a new feature coming in the next version of SQL Server (code-named “Denali”), builds on FILESTREAM and takes database BLOB management to the next level. In this post, I’ll explain FileTable and show you how it works. I also encourage you to experiment with FileTable yourself by downloading and installing SQL Server Denali CTP3 from http://bit.ly/DenaliCTP3 (but be sure to install it on a virtual machine; beta software can be difficult or impossible to uninstall).

Note: FileTable relies on FILESTREAM, so you need to understand FILESTREAM to fully understand FileTable. If you’re new to FILESTREAM, you can get the necessary background by reading my three earlier blog posts Introducing FILESTREAM, Enabling and Using FILESTREAM, and Using SqlFileStream.

FileTable combines FILESTREAM with hierarchyid (another SQL Server 2008 feature) and the Windows file system API to deliver new and exciting BLOB capabilities in SQL Server. (I just love seeing reusability leveraged like this.) Like the two words joined in its name, one FileTable functions as two distinct things simultaneously:

  1. A FileTable is an Ordinary Table
  2. A FileTable is an Ordinary File System

First and foremost, a FileTable is a regular SQL Server database table in every respect, with one exception: The schema of a FileTable is fixed. The columns of a FileTable and their data types are pre-determined by SQL Server. Specifically, every FileTable contains these columns:

Column Name Data Type Description
stream_id uniqueidentifier ROWGUIDCOL Unique row identifier
file_stream varbinary(max) FILESTREAM BLOB content (NULL if directory)
name nvarchar(255) Name of file or directory
path_locator hierarchyid Location of file or directory within the logical file system
creation_time datetimeoffset(7) Created
last_write_time datetimeoffset(7) Last Modified
last_access_time datetimeoffset(7) Last Accessed
is_directory bit 0=file, 1=directory
is_offline bit Storage attributes
is_hidden bit
is_readonly bit
is_archive bit
is_system bit
is_temporary bit

Every FileTable implements a logical folder structure using the path_locator column. This is a hierarchyid value that denotes the location of each file and folder (row) within the logical file system (table). The hierarchyid data type was introduced in SQL Server 2008 as a binary value that, relative to other hierarchyid values in the same tree structure, identifies a unique node position (reminder to self: blog on hierarchyid!). It is implemented as a system CLR type, which means that it’s a .NET framework class, and has a set of methods you can use (e.g., GetAncestor, GetDescendant, GetReparentedValue, IsDescendantOf, etc.) to traverse and manipulate the hierarchy. Thus, it’s perfect for casting the hierarchical structure of a file system over a relational table, as FileTable does.

The path_locator column is defined as the table’s primary key with a non-clustered index. A separate key value is also stored in the stream_id column with a non-clustered unique index. This is the uniqueidentifier ROWGUIDCOL value required by any table with varbinary(max) FILESTREAM columns, so FileTable is no exception. And unlike path_locator, this unique value will never change once it is assigned to a new FileTable row, even if the row is later “reparented” (i.e., moved to another location in the hierarchy). Thus, you should treat stream_id as each row’s “ID,” even though it isn’t the table’s primary key.

Every row in a FileTable corresponds precisely to either a single file or folder, as determined by the bit value in the is_directory column. The file or folder name is stored in the name column as an nvarchar(255) string. All of the other column names are self-describing, and are used to store typical file system information such as various timestamps and storage attributes. The actual file content (the BLOB itself) is stored in the file_stream column, which is a varbinary(max) data type decorated with the FILESTREAM attribute. This means that the binary content in the file_stream column is stored in the NTFS file system that SQL Server is managing behind the scenes, rather than the structured file groups where all the other table data is stored (standard FILESTREAM behavior in SQL Server 2008).

In addition to these fourteen columns, each FileTable includes the following three computed (read-only) columns:

Column Name Data Type Description
parent_path_locator hierarchyid Parent node derived from path_locator
file_type nvarchar(255) Extension derived from name
cached_file_size bigint BLOB byte length derived from file_stream

The parent_path_locator column returns the result of calling GetAncestor(1) on path_locator to obtain the path_locator to the parent folder of any file or folder in the table. The file_type column returns the extension of the filename parsed from the string value in the name column. And the cached_file_size column returns the number of bytes stored in the file_stream column (these are BLOBs stored in SQL Server’s internally-managed NTFS file system behind the scenes).

With this fixed schema in place, every FileTable has what it needs to represent a logical file system. Thus, SQL Server is able to fabricate a Windows file share over any FileTable. This magically exposes the FileTable to any user or application who can then view and update the table using standard file I/O semantics (e.g., drag-and-drop with Windows Explorer, or read/write with System.IO.FileStream). So:

  • Creating a file or directory in the logical file system adds a row to the table
  • Adding a row to the table creates a file or directory in the logical file system

Here’s the total FileTable picture:

Take a moment to digest what’s happening here. In addition to the programmatic FILESTREAM access using either T-SQL or SqlFileStream, Denali now offers a third interface to FILESTREAM: A logical file system. In a sense, this fills the SQL Server 2008 FILESTREAM gap in which the file system itself is completely inaccessible. This is not to say that FileTable lets you directly access SQL Server’s internally managed NTFS file system; certainly not. That remains obfuscated and private, as it continues to function in standard FILESTREAM fashion against BLOB data that just happens to be in a FileTable instead of a regular table. What we do get is an abstraction layer over the FileTable that functions as a standard file system. In this logical file system, everything about each file and folder—except the BLOB content of the files themselves—is stored in the FileTable’s structured file group, while the BLOBs themselves are physically stored in the NTFS file system.

So you can see that there’s really nothing new beneath the FileTable layer; as before, SQL Server synchronizes transactional access between the row in the FileTable and its corresponding BLOB content in the NTFS file system to ensure that integrity is properly maintained between them. As with T-SQL access, this synchronization occurs implicitly when manipulating the FileTable via the exposed Windows file share. And, being an ordinary table in virtually every respect, you also have the option to use SqlFileStream with explicit transaction synchronization for the fastest possible streaming of BLOBs into and out of a FileTable.

I don’t know about you, but I find all of this extremely appealing. We now have total flexibility for BLOB storage in the database. With FileTable, you can easily migrate existing applications that work against physical file systems without writing any custom T-SQL or fancy SqlFileStream code. Just use a FileTable, let the existing applications continue working without modification, and enjoy the benefits of your files becoming an integral part of the SQL Server database.

That’s the whole FileTable story. The rest of this post just walks you through the steps and syntax for getting a FileTable up and running. It’s quite simple and straightforward. Specifically, you’ll need to:

  • Enable FILESTREAM for file system access
  • Create a FileTable-enabled database
  • Create a FileTable

Enable FILESTREAM for File System Access

FILESTREAM must be enabled for file system access at both the service level (either during setup, or later via the SQL Server Configuration Manager) and at the instance level. Complete details can be found in my Enabling and Using FILESTREAM post. Once enabled, SQL Server exposes a file share for all the FileTable-enabled databases you create under the SQL Server instance (by default, the share name is MSSQLSERVER).

The following statement enables FILESTREAM at the instance level for file system access (level 2).

EXEC sp_configure filestream_access_level, 2
RECONFIGURE

Create a FileTable-Enabled Database

Naturally, a FileTable-enabled database must include the FILEGROUP…CONTAINS FILESTREAM clause expected of any FILESTREAM-enabled database. In addition, you must also specify two parameters in the SET FILESTREAM clause of the the CREATE DATABASE (or ALTER DATABASE) statement. The DIRECTORY_NAME specifies the name of the folder that will be created for this database in the root file share associated with the instance. And enabling full non-transacted access with NON_TRANSACTED_ACCESS=FULL exposes every FileTable within the database as a subfolder beneath the database folder of the instance’s file share.

CREATE DATABASE PhotoLibrary
 ON PRIMARY
  (NAME = PhotoLibrary_data,
   FILENAME = 'C:\DB\PhotoLibrary_data.mdf'),
 FILEGROUP FileStreamGroup CONTAINS FILESTREAM
  (NAME = PhotoLibrary_blobs,
   FILENAME = 'C:\DB\Photos')
 SET FILESTREAM(
  (DIRECTORY_NAME='PhotoLibrary',
   NON_TRANSACTED_ACCESS=FULL)
 LOG ON
  (NAME = PhotoLibrary_log,
   FILENAME = 'C:\DB\PhotoLibrary_log.ldf')

Create a FileTable

Not surprisingly, this is the easiest part. Since SQL Server controls the schema of every FileTable, you just use a CREATE TABLE statement with the new AS FileTable clause and don’t include any columns:

CREATE TABLE PhotoFiles AS FileTable

Your FileTable is ready to use. You will find a root PhotoFiles folder for the FileTable beneath the PhotoLibrary folder created for the database in the Windows file share for the instance. You can interact with the FileTable using T-SQL, SqlFileStream, or the logical file system exposed by the Windows file share.

Summary

FILESTREAM + hierarchyid + Windows File Share = FileTable

Genius! Enjoy…

Watch Out Installing Denali CTP3 from DVD to Virtual Machines

So Microsoft just publicly released CTP3 of Denali, the code name for the next (2012) version of SQL Server. But beware installing from DVD to Virtual Machines! I downloaded CTP3, burned it to a DVD, and ran the installer .exe from the DVD on a VM. The installer appears to work and then just disappears. No message. No error. No nothing. No way!!!

It turns out that this particular installer attempts to unpackage the downloaded files in place; that is, on the DVD — which it can’t of course. But it won’t complain. Nor will it prompt you for an alternative unpackage location. It’ll just appear to install and disappear. And you’ll be left scratching your head. Gotta love beta software!

The only way to make it work is to run the installer on a hard disk (either inside or outside the VM) to unpackage the download, and then install from there. For more information and a complete walkthrough for installing Denali CTP3, check out Aaron Bertrand’s post at http://sqlblog.com/blogs/aaron_bertrand/archive/2011/07/12/sql-server-v-next-denali-ctp3-installation-walk-through.aspx.

Denali CTP3 Goes Public!

Just released today, the third Community Technology Preview of SQL Server 2012 codenamed Denali. Really looking forward to (finally!) digging into FILETABLE and numerous other enhancements which I’ll blog about soon. CTP2 was an internal release only, and CTP1 was (frankly) quite bare. Download the 32- and 64-bit versions now — and the Feature Pack – from these easy bit.ly links I created:

http://bit.ly/DenaliCTP3
http://bit.ly/DenaliCTP3FP

Enjoy!

Using Sequences in SQL Server Denali

Historically, one notable difference between SQL Server and other database platforms (such as Oracle and DB2) has been the manner in which you implement automatically assigned integer values for primary keys when inserting new rows. SQL Server provides the IDENTITY attribute for the int or bigint data types, while other platforms require you to create an independent “sequence generator” object that feeds incrementing values to new rows in the table. These are two different ways to achieve the same thing, and SQL Server’s IDENTITY attribute is arguably simpler to use, but sequence generators offer their own unique advantages as well.

SQL Server Denali now also supports sequence generators as a powerful alternative to using the IDENTITY attribute. Sequences can be created on integer types (both built-in and user-defined), and you can specify the minimum, maximum, start, and increment (or decrement) values for the sequence. You can also cycle back around to the minimum value when the maximum value is reached (or vice versa).

Sequences are most commonly used to assign new primary key values on INSERT operations as an alternative to using IDENTITY-attributed key columns. But because they exist as objects independent of the data tables that they feed new primary keys to, they offer more flexibility and pose fewer limitations. For example, you can obtain the next value in the sequence before you perform the INSERT, and you can insert any (unique) value into the primary key column without requiring SET IDENTITY INSERT ON/OFF statements. The NEXT VALUE FOR syntax can be used in any SELECT, INSERT, or UPDATE statement to request and increment the next value in the sequence. Sequences are valid in SET, CASE, and DECLARE statements that set integer values, and they can be used in stored procedures, UDFs, and triggers. Let’s see how sequences work:

CREATE TABLE Customer
(Id        int PRIMARY KEY,
 FirstName varchar(max),
 LastName  varchar(max))

-- Create the sequence with a start, increment, and min/max settings
CREATE SEQUENCE CustomerSequence AS int
 START WITH 1
 INCREMENT BY 1
 MINVALUE 0
 NO MAXVALUE

-- Use the sequence for new primary key values
INSERT INTO Customer (Id, FirstName, LastName) VALUES
(NEXT VALUE FOR CustomerSequence, 'Bill', 'Malone'),
(NEXT VALUE FOR CustomerSequence, 'Justin', 'Thorp'),
(NEXT VALUE FOR CustomerSequence, 'Paul', 'Duffy')

Our Customer table uses an integer primary key, but does not specify the IDENTITY attribute. Instead, we use the new CREATE SEQUENCE statement in SQL Server Denali to create a CustomerSequence object that feeds new integers, starting with one and incrementing by one, with no specified upper limit (beyond the maximum size for the integer data type; a 32-bit int data type in this example). We then INSERT three new rows, each of which specifies NEXT VALUE FOR CustomerSequence as the new Id value (note that we’re using the row constructor syntax introduced in SQL Server 2008 to INSERT the three rows with a single statement). The result, as you’d expect, appears like this:

SELECT * FROM Customer

Id    FirstName LastName
----- --------- -------------
1     Bill      Malone
2     Justin    Thorp
3     Paul      Duffy

(3 row(s) affected)

Most likely, you’ll want to emulate the experience of using the IDENTITY attribute; that is, you may wish to omit the primary key values from the INSERT statement and put SQL Server in charge of assigning them to new rows. This is easily done by establishing NEXT VALUE FOR as a DEFAULT constraint on the Id column, like this:

-- Set the default for IDENTITY-behavior
ALTER TABLE Customer
 ADD DEFAULT NEXT VALUE FOR CustomerSequence FOR Id

-- Generates customer ID 4
INSERT INTO Customer (FirstName, LastName) VALUES('Jeff', 'Smith')

With the DEFAULT constraint in place, the INSERT statement looks and works just the same as if we were using an IDENTITY-attributed primary key column. Customer Jeff Smith is automatically assigned an Id value of 4. But by using sequences, we can enjoy a few additional benefits. For example, we can peek at the current value without consuming it by querying the sys.sequences catalog view. Among the many parameters for each sequence object in the database exposed by this view, the current_value column reveals the currently assigned value:

SELECT current_value FROM sys.sequences WHERE name='CustomerSequence'

current_value
-------------
4

(1 row(s) affected)

You can also use the ALTER SEQUENCE statement to change the behavior of an existing sequence object, like so:

ALTER SEQUENCE CustomerSequence
 RESTART WITH 1100
 MINVALUE 1000
 MAXVALUE 9999
 CYCLE

Now the next value returned by CustomerSequence will be 1100, and it will continue to increment by one from there. When it tops 9999, it will start again from 1000 and continue incrementing normally. Naturally, given the uniqueness of primary keys, the sequence can no longer be used to populate the Customer table 100 rows after it recycles to 1000, because we already have customers with primary keys starting at 1100.

Of course, sequences have some restrictions, but these aren’t too onerous overall. Sequences cannot be used in a subquery, CTE, TOP clause, CHECK CONSTRAINT definition, or as an argument to an aggregate function. They also can’t be used in views, computed columns, and user-defined functions, as those object types are not allowed to cause the side effects of sequence number generation. Finally, you cannot issue a DROP SEQUENCE statement to delete a sequence while you have tables with existing DEFAULT constraints that reference the sequence; you’ll need to drop the referencing constraints before you can drop the sequence.

Download the Denali CTP now to start playing with sequences, and have fun!

Server Side Paging with SQL Server Code-Named “Denali”

Well, the captain has just approved the use of electronic devices on my return flight from Las Vegas, where I just spent an exciting week speaking at and attending Visual Studio Live! And I’m eager to share some juicy new SQL Server Denali features with you that I demonstrated in my SQL Server workshop on Friday. Denali, in case you don’t already know, is the code name for the next major version of SQL Server (version 11.0, but whether it will be branded as SQL Server 2011 or SQL Server 2012 is still undetermined).

In the next several posts, I’ll share some of the new T-SQL enhancements offered by Denali. You can start playing with these features right now by downloading the Community Technology Preview (currently, CTP1 is the only publicly available release). Although Microsoft’s greatest advances in Denali center on new Business Intelligence capabilities (which they’re now dubbing “Pervasive Insight”), there are still lots of goodies in store for developers in the relational engine. In this post, I’ll describe a new feature that makes it easier than ever to implement server-side paging.

One Page At A Time, Please

Your query may return hundreds, thousands, or even millions of rows, but users can only cope with so much data at a time. They need paged results; that is, they want to view one chunk at a time in “pages”. The page size can vary of course; 10, 25, 50, or 100 results at a time, but any more than that is just not reasonable. Sure you can allow your query to execute and return a complete resultset from the data access layer to your middle-tier, and then deliver only the subset of that resultset for a single page to display in the user interface. But that’s an incredibly wasteful and inefficient approach. Ideally, of course, if one page is all you want, then that one particular page is all you should need to retrieve from SQL Server when you execute your query.

Returning paged query results was difficult to achieve prior to SQL Server 2005. That release of SQL Server introduced a series of ranking functions, including the ROW_NUMBER function that made it feasible to return one page at a time from your query. I’ll first show how to use ROW_NUMBER in SQL Server 2005 and higher to implement server-side paging, and then show you how much easier it is to achieve the same goal using the new OFFSET/FETCH NEXT syntax introduced in SQL Server Denali.

Using ROW_NUMBER (SQL Server 2005)

The ROW_NUMBER function does just what its name implies; it generates a sequential number for each row in the resultset returned by your query. The value returned by the ROW_NUMBER function can then be used in your query’s WHERE clause to limit the resultset to just the desired page. Here’s an example from the AdventureWorks2008R2 database that demonstrates how to use ROW_NUMBER. It returns “page 3” of the query results, which (assuming a page size of 10) are rows 21 through 30:

-- Get 10 rows starting at row 21 (rows 21-30; i.e., page 3)
DECLARE @PageNum int = 3
DECLARE @PageSize int = 10
DECLARE @FirstRow int = ((@PageNum - 1) * @PageSize) + 1
DECLARE @LastRow int = @FirstRow + @PageSize - 1

SELECT *
 FROM
   (SELECT RowNum = ROW_NUMBER() OVER (ORDER BY LastName, FirstName), *
     FROM Person.Person
   ) AS a
 WHERE RowNum BETWEEN @FirstRow AND @LastRow
 ORDER BY LastName, FirstName

And here are the results:

Now this certainly works, but there are two undesirables here. First, it requires you to manufacture the row number as an additional column in your resultset, whether or not you want/need it. Second, the syntax is somewhat contorted; the required use of a nested SELECT statement and multiple ORDER BY clauses (as well as the required alias “AS a”) is both awkward and unintuitive.

Using OFFSET/FETCH NEXT (SQL Server Code-Named “Denali”)

Now take a look at how the same result can be achieved in Denali:

DECLARE @PageNum int = 3
DECLARE @PageSize int = 10
DECLARE @Offset int = (@PageNum - 1) * @PageSize

SELECT *
 FROM Person.Person
 ORDER BY LastName, FirstName
 OFFSET @Offset ROWS FETCH NEXT @PageSize ROWS ONLY

I don’t see how the syntax could be any simpler than this, do you? Just specify your starting row with OFFSET and your page size with FETCH NEXT. Here are the results:

You can see that this query returns the same “page” as the previous version of the query that used the ROW_NUMBER function, yet it wasn’t necessary to manufacture a row number column to do it, nor did we need to code a subquery. It’s highly probable that your presentation layer is well aware of which row numbers are being display; it is after all in charge of managing the UI concerns of which page is being displayed to the user. Nevertheless, if you still want row numbers returned in your resultset, you can combine the new Denali syntax with ROW_NUMBER as follows if desired (you’ll still be able to achieve this with a single SELECT, but you’ll need to duplicate the ORDER BY clause with OVER):

DECLARE @PageNum int = 3
DECLARE @PageSize int = 10
DECLARE @Offset int = (@PageNum - 1) * @PageSize

SELECT RowNum = ROW_NUMBER() OVER (ORDER BY LastName, FirstName), *
 FROM Person.Person
 ORDER BY LastName, FirstName
 OFFSET @Offset ROWS FETCH NEXT @PageSize ROWS ONLY

So there you have it. Server side paging made as easy as one could hope for, thanks to the new OFFSET/FETCH NEXT syntax in SQL Server Denali.

Follow

Get every new post delivered to your Inbox.

Join 36 other followers