Rethinking the Dynamic SQL vs. Stored Procedure Debate with LINQ

Dynamic SQL is evil, right? We should only be using stored procedures to access our tables, right? That will protect us from SQL injection attacks and boost performance because stored procedures are parameterized and compiled, right?

Well think again! The landscape of this debate has changed dramatically, especially with the advent of Language-Integrated Query (LINQ) technologies against relational databases, such as LINQ to SQL (L2S) against SQL Server and LINQ to Entities (L2E) against the Entity Framework’s (EF) Entity Data Model (EDM) over any RDBMS with an ADO.NET provider for EF. This is because, despite the fact that both these ORM technologies support the use of stored procedures, their real intended use is to generate dynamic SQL.

For simplicity, the examples below only use LINQ to SQL, but the same principles apply to LINQ to Entities.

Comparing Dynamic SQL and Stored Procedures in LINQ Queries

Let’s compare the behavior of LINQ queries that generate dynamic SQL with LINQ queries that invoke stored procedures. In our example, we’ll use the Sales.Currency table in the AdventureWorks 2008 database that contains a complete list of global currency and exchange rates. The following L2S query generates dynamic SQL to select just those currencies with codes beginning with the letter B:

// LINQ to SQL query using dynamic SQL
var q =
    from currency in ctx.Currencies
    where currency.CurrencyCode.StartsWith("B")
    select currency;

The ctx variable is the L2S DataContext, and its Currencies property is a collection mapped to the Sales.Currency table in the database (it’s common practice to singularize table names and pluralize collection names).

Modifying this query to use a stored procedure is easy. Assuming we create a stored procedure named SelectCurrencies that executes SELECT * FROM Sales.Currency and then import that stored procedure into either our L2S (.dbml) or EF (.edmx) model, the LINQ query requires only a slight modification to have it call the stored procedure instead of generating dynamic SQL against the underlying table:

// LINQ to SQL using a stored procedure
var q =
    from currency in ctx.SelectCurrencies()
    where currency.CurrencyCode.StartsWith("B")
    select currency;

The only change made to the query is the substitution of the SelectCurrencies method (which is a function that maps to the stored procedure we imported into the data model) for the Currencies property (which is a collection that maps directly to the underlying Sales.Currency table). 

Examining the Generated SQL

There is a major performance problem with this new version of the query, however, which may not be immediately apparent. To understand, take a look at the generated SQL for both queries:

Query 1 (dynamic SQL):

SELECT [t0].[CurrencyCode], [t0].[Name], [t0].[ModifiedDate]
FROM [Sales].[Currency] AS [t0]
WHERE ([t0].[CurrencyCode] LIKE @p0)
-- @p0: Input NVarChar (Size = 2; Prec = 0; Scale = 0) [B%]

Query 2 (stored procedure):

EXEC @RETURN_VALUE = [dbo].[SelectCurrencies]
-- @RETURN_VALUE: Output Int (Size = 0; Prec = 0; Scale = 0) [Null]

Now the problem should be blatantly obvious. The first query executes the filter condition for currencies with codes that start with the letter B in the WHERE clause of the SELECT statement on the database server (by setting parameter @p0 to ‘B%’ and testing with LIKE in the WHERE clause). Only results of interest are returned to the client across the network. The second query executes the SelectCurrencies stored procedure which returns the entire table to the client across the network. Only then does it get filtered by the where clause of the LINQ query to reduce that resultset and obtain only currencies with codes that start with B, while all the other (“non-B”) rows that just needlessly traversed the network from SQL Server are immediately discarded. That clearly amounts to wasted processing, and is a serious performance penalty for the use of stored procedures with LINQ.

Of course, one obvious solution to this problem is to modify the SelectCurrencies stored procedure to accept a @CurrencyCodeFilter parameter and change its SELECT statement to test against that parameter as follows: SELECT * FROM Sales.Currency WHERE CurrencyCode LIKE @CurrencyCodeFilter. That will ensure that only the rows of interest are returned from the server, just like the dynamic SQL version behaves. The LINQ query would then look like this:

// LINQ to SQL using a parameterized stored procedure
var q =
    from currency in ctx.SelectCurrencies("B%")
    select currency;

Performance problem solved, but this solution definitely begs the question “where’s the WHERE?” – in the stored procedure, or in the LINQ query? It needs to be in the stored procedure to prevent unneeded rows from being returned across the network, but then it’s not in the LINQ query any more. So LINQ queries that you need optimized for stored procedures won’t have a where clause in them, and in my humble opinion, that seriously undermines the effectiveness and expressiveness of LINQ queries because the query is now far less “language-integrated”.

Revisiting the Big Debate

Clearly LINQ to SQL and Entity Framework want us to embrace hitting the database server with dynamic SQL, but many database professionals live by the creed “dynamic SQL is the devil, and thou shalt only use stored procedures.” So let’s re-visit the heated “dynamic SQL vs. stored procedure” debate.

Proponents of stored procedures cite the following primary reasons against using dynamic SQL:

1) Security: Your application becomes vulnerable to SQL injection attacks.

2) Performance: Dynamic SQL doesn’t get compiled like stored procedures do, so it’s slower.

3) Maintainability: Spaghetti code results, as server-side T-SQL code is then tightly coupled and interwoven with client-side C# or VB .NET.

Let’s address these concerns:

1) Security: Vulnerability to SQL injection attacks results from building T-SQL statements using string concatenation. Even stored procedures are vulnerable in this respect, if they generate dynamic SQL by concatenating strings. The primary line of defense against SQL injection attacks it to parameterize the query, which is easily done with dynamic SQL. That is, instead of concatenating strings to build the T-SQL, compose a single string that has one or more parameters, and then populate the Parameters collection of the SqlCommand object with the parameter values (this is how the L2S query above prepared the command for SQL Server, as evidenced by the output of the generated T-SQL that uses the parameter @p0 in the WHERE clause).

2) Performance: You’re a little behind the times on this one. It’s true that stored procedures would get partially compiled to speed multiple executions in SQL Server versions 6.5 (released in 1996) and earlier. But as of SQL Server 7.0 (released in 1999), that is no longer the case. Instead, SQL Server 7.0 (and later) compiles and caches the query execution plan to speed multiple executions of the same query (where only parameter values vary), and that’s true whether executing a stored procedure or a SQL statement built with dynamic SQL.

3) Maintainability: This remains a concern if you are embedding T-SQL directly in your .NET code. But with LINQ, that’s not the case because you’re only composing a LINQ query (designed to be part of your .NET code); the translation to T-SQL occurs on the fly at runtime when you execute your application.

Making a Good Compromise

These facts should change your perspective somewhat. But if you’re a die-hard stored procedure proponent that finds it hard to change your ways (like me), consider this compromise: Allow dynamic SQL for SELECT only, but continue using stored procedures for INSERT, UPDATE, and DELETE (which can be imported into your data model just like a SELECT stored procedure can). This is a good strategy because LINQ queries only generate SELECT statements to retrieve data. They can’t update data. Only the SubmitChanges method on the L2S DataContext (or the SaveChanges method on the L2E ObjectContext) generates commands for updating data, with no downside to using stored procedures over dynamic SQL like there is for SELECT.

So you can (and should) stick to using stored procedures for INSERT, UPDATE, and DELETE operations, while denying direct access to the underlying tables (except for SELECT). Doing so allows you to continue using stored procedures to perform additional validation that cannot be bypassed by circumventing the application layer and communicating directly with the database server.

Advertisements