Making the Case for Entity Framework in the Enterprise

Recently, I was met with some friction by the IT department at a client where, they asserted, that a decision had been made years ago to ban Entity Framework. Like many enterprise environments, this client was understandably concerned with the potential pitfalls of embracing Entity Framework. That meant that my job was to convince them otherwise – not to discount their apprehension, but quite the contrary – to demonstrate how EF can be leveraged for its advantages, and avoided for its shortcomings.

Entity Framework (EF) is a broad framework with many optional parts. There are several aspects of EF that provide great benefit, while others are a source of great consternation – particularly from the perspective of the database purist. As the cliché goes, “with great power comes great responsibility,” and so this blog post explores different aspects of EF, and identifies how to best leverage the framework, and not abuse it.

There are two distinct developer experiences with EF.

  • Design time
  • Run time

EF at Design Time

At design time, there are many different ways that developers can choose to use EF. All of them typically involve code generators, which could be any combination of automatically generated C# code (POCO classes, stored procedure wrappers) and T-SQL code (DDL).

The exact type of code generated depends on the methodology employed by the application developer. These are the set of “xxx-first” approaches to database design, which include:

  • Database-first
  • Model-first
  • Code-first

Database-first is the only acceptable methodology for SQL Server purists. It essentially means that the database comes first, and that EF has no influence on the database design. EF does not generate DDL to create the database, and there is no EF “smell” in the design. Instead, the SQL Server expert implements the database by hand, applying all their experience, and using every best practice there is. They create the database with properly normalized tables, data types, foreign key constraints, unique constraints, check constraints, stored procedures, naming conventions, and so on. And, ideally, they build and maintain the database design using SQL Server Data Tools (SSDT) in Visual Studio with source control (TFS, GIT, etc.), rather than by managing scripts in SSMS.

With the other methodologies (model-first and code-first), the application developer focuses on their logical model (either as a visual .edmx file, or as POCO classes in C# code), and lets the EF design time “figure out the database” from the model. They let EF generate the DDL, which in turn creates the database. All SQL Server data types are inferred from their closest .NET equivalent. Then, as the model evolves, EF generates code migrations to rev-up and rev-down the database.

This approach is typically rejected by database purists, who rightfully object to the database being “reverse-engineered” from a model (.edmx file), or from C# POCO classes. With model-first and code-first, all direct control over the database design is lost. Thus, the only way to control the database design is indirectly, by coercing EF’s design-time DDL generator through the use of naming conventions or [attributes]. Essentially, it means that the application developer, who often lacks the experience and skills needed to properly design a database, gets to design the database. This is a terrible way to build and evolve an enterprise database model.

Our client’s application is strictly database-first, and uses SSDT to centrally store the database design, with full version history in TFS. Furthermore, it leverages a unique open-source T4 template code generator that enables database-first with an extensible object model, and zero friction for migrations. I have personally made several contributions to this utility, which is called the EntityFramework Reverse POCO Generator.

https://marketplace.visualstudio.com/items?itemName=SimonHughes.EntityFrameworkReversePOCOGenerator#itemName=SimonHughes.EntityFrameworkReversePOCOGenerator

This generator produces C# POCO classes that are always in sync with the tables in the database, and it creates these as partial classes, so that the C# developer can easily build extensions that survive code regeneration. It also generates strongly typed stored procedure wrapper methods, which promotes and simplifies the use of stored procedures at run time.

EF at Run Time

While every developer design time experience may vary, the run time experience is always the same.

  1. Connect to the database with a context
  2. Retrieve objects by querying the context
  3. Modify objects in memory, while the context tracks changes
  4. Save modified objects back to the database

Here, database purists have additional concerns, particularly with item #2. Specifically how is the database being queried? It could be either direct SQL or a stored procedure. When EF is used to call a stored procedure, then it’s no different than calling that stored procedure using conventional ADO.NET or Enterprise Library. Whether it’s an EF DbContext object or an old-fashioned SqlCommand object, the query is prewritten and embedded in the stored procedure.

If, on the other hand, the developer uses EF to execute a LINQ query, then the EF runtime dynamically generates direct SQL. This can potentially (and all to often does) result in poor queries being sent to SQL  Server by the application. I stress potentially because, a) the dynamically generated SQL queries have in fact improved greatly over the many versions of EF, and b) just because a stored procedure is being used instead doesn’t necessarily mean better performance; for example, if the stored procedure has a badly written query inside of it.

It’s similar to the design-time concern; an application developer that lacks experience in SQL Server writes a LINQ query in C# that results in generated SQL that performs poorly. All to often, these developers don’t understand deferred execution in EF, and they can repeatedly execute the same query over and over inside a loop (rather than just once) without realizing it. They are also prone to just append one .Include after another in their LINQ query, resulting in a great number of JOINs that returns a lot of duplicate parent data in the resultset.

EF has excellent run time support for stored procedures, and, as mentioned, the T4 template code generator provides the C# developer with ready-to-use methods that wrap every stored procedure. The vast majority of data access against the database should occur by calling stored procedures. Like the database modeling at design time, these stored procedures should be written by hand with careful attention paid to SQL Server best practices. However, there may be some lightweight cases (for example, simple single-row, single-table SELECTs) where you could allow EF to generate the query without a stored procedure, where the generated query is not objectionable.

Furthermore, you can implement a SelectEntity stored procedure that returns an entire entity as XML. This not only means using a stored procedure and not EF to query the database, but that the stored procedure will produce the most lightweight resultset without all the duplicate parent data that would result with traditional JOINs in the stored procedure.

One of the biggest client-side advantages that EF provides is change tracking. After retrieving and deserializing a complete entity via the aforementioned stored procedure, you can attach it to the EF context so that EF can track changes to it, just the same as it would if EF queried the database without the stored procedure. Then, as you manipulate the object in memory, the EF context keeps track of what rows in which tables need to be INSERTed, UPDATEd, or DELETEd. These are complex details that are very difficult to implement by hand, while allowing the context to figure that all out dramatically streamlines your development process.

Summary

In conclusion, the philosophy is that best SQL Server practices are followed, both at design time with respect to data modeling, and at run time with respect to querying. You should take a “pick-and-choose” approach to EF, embracing those aspects that provide compelling developer benefits, and rejecting those that run contrary to proper database design and query principles.

Thus, your application should take a database-first approach to data modeling, and use stored procedures exclusively to query the database, ensuring no EF “smell” at either design time or run time.

Advertisements

To Use or Not To Use DataSets

That has been the subject of great debate since the dawn of .NET, and is now even more debatable with the availability of Entity Framework. Some developers have dismissed DataSets out of hand long ago, primarily because—despite their ability to be strongly typed and to encapsulate business logic—they are not true business objects. For example, you need to navigate through relationship objects in the DataSet to connect between parent and child rows. This is not intuitive to object oriented programmers, who think of parent child relationships in simpler terms; each parent has as a child collection property and each child has a parent property. Furthermore, the DataSet paradigm does not allow for inheritance, which is also extremely important to object-oriented developers. Null values in a DataSet also require special handling.

Notwithstanding these concerns, I don’t generally advocate dismissing any technology out of hand. Every application is different, and you are doing yourself a disservice if you don’t examine the facets of all available choices on a case by case basis. Like anything else, DataSets can be used or they can be abused, and it’s true that they do present limitations if you try to use them as business objects.But if what you need is a generic in-memory database model, then that’s what a DataSet is, and that’s what a DataSet gives you. While the aforementioned concerns are all valid, the fact remains that DataSets are very powerful and can serve extremely well as data transfer objects. Furthermore, their unique ability to dynamically adapt their shape according to the schema of whatever data you stream into them is a capability that none of the newer APIs provide.

So to be clear, DataSets are not obsolete. The DataSet is a cornerstone of the .NET framework, and it is not going away. In fact, even as late as .NET 3.5 when LINQ to SQL and Entity Framework were first released, Microsoft has been enhancing DataSets. For example, the TableAdapterManager was added to greatly simplify hierarchical updates. The fact remains that DataSets do still have their place, and you are very likely to encounter them for a long time to come as you maintain existing applications.

Having said that, let me also stress that Microsoft has clearly positioned ADO.NET Entity Framework as the preferred data access technology today and in the future, eclipsing both DataSets and LINQ to SQL. Furthermore, Silverlight implements only a subset of the .NET Framework, and conspicuously absent from that subset is the DataSet. Although there are several third-party vendors that provide a DataSet object for Silverlight, this omission on the part of Microsoft is another indication that DataSets are discouraged for new development.

Rethinking the Dynamic SQL vs. Stored Procedure Debate with LINQ

Dynamic SQL is evil, right? We should only be using stored procedures to access our tables, right? That will protect us from SQL injection attacks and boost performance because stored procedures are parameterized and compiled, right?

Well think again! The landscape of this debate has changed dramatically, especially with the advent of Language-Integrated Query (LINQ) technologies against relational databases, such as LINQ to SQL (L2S) against SQL Server and LINQ to Entities (L2E) against the Entity Framework’s (EF) Entity Data Model (EDM) over any RDBMS with an ADO.NET provider for EF. This is because, despite the fact that both these ORM technologies support the use of stored procedures, their real intended use is to generate dynamic SQL.

For simplicity, the examples below only use LINQ to SQL, but the same principles apply to LINQ to Entities.

Comparing Dynamic SQL and Stored Procedures in LINQ Queries

Let’s compare the behavior of LINQ queries that generate dynamic SQL with LINQ queries that invoke stored procedures. In our example, we’ll use the Sales.Currency table in the AdventureWorks 2008 database that contains a complete list of global currency and exchange rates. The following L2S query generates dynamic SQL to select just those currencies with codes beginning with the letter B:

// LINQ to SQL query using dynamic SQL
var q =
    from currency in ctx.Currencies
    where currency.CurrencyCode.StartsWith("B")
    select currency;

The ctx variable is the L2S DataContext, and its Currencies property is a collection mapped to the Sales.Currency table in the database (it’s common practice to singularize table names and pluralize collection names).

Modifying this query to use a stored procedure is easy. Assuming we create a stored procedure named SelectCurrencies that executes SELECT * FROM Sales.Currency and then import that stored procedure into either our L2S (.dbml) or EF (.edmx) model, the LINQ query requires only a slight modification to have it call the stored procedure instead of generating dynamic SQL against the underlying table:

// LINQ to SQL using a stored procedure
var q =
    from currency in ctx.SelectCurrencies()
    where currency.CurrencyCode.StartsWith("B")
    select currency;

The only change made to the query is the substitution of the SelectCurrencies method (which is a function that maps to the stored procedure we imported into the data model) for the Currencies property (which is a collection that maps directly to the underlying Sales.Currency table). 

Examining the Generated SQL

There is a major performance problem with this new version of the query, however, which may not be immediately apparent. To understand, take a look at the generated SQL for both queries:

Query 1 (dynamic SQL):

SELECT [t0].[CurrencyCode], [t0].[Name], [t0].[ModifiedDate]
FROM [Sales].[Currency] AS [t0]
WHERE ([t0].[CurrencyCode] LIKE @p0)
-- @p0: Input NVarChar (Size = 2; Prec = 0; Scale = 0) [B%]

Query 2 (stored procedure):

EXEC @RETURN_VALUE = [dbo].[SelectCurrencies]
-- @RETURN_VALUE: Output Int (Size = 0; Prec = 0; Scale = 0) [Null]

Now the problem should be blatantly obvious. The first query executes the filter condition for currencies with codes that start with the letter B in the WHERE clause of the SELECT statement on the database server (by setting parameter @p0 to ‘B%’ and testing with LIKE in the WHERE clause). Only results of interest are returned to the client across the network. The second query executes the SelectCurrencies stored procedure which returns the entire table to the client across the network. Only then does it get filtered by the where clause of the LINQ query to reduce that resultset and obtain only currencies with codes that start with B, while all the other (“non-B”) rows that just needlessly traversed the network from SQL Server are immediately discarded. That clearly amounts to wasted processing, and is a serious performance penalty for the use of stored procedures with LINQ.

Of course, one obvious solution to this problem is to modify the SelectCurrencies stored procedure to accept a @CurrencyCodeFilter parameter and change its SELECT statement to test against that parameter as follows: SELECT * FROM Sales.Currency WHERE CurrencyCode LIKE @CurrencyCodeFilter. That will ensure that only the rows of interest are returned from the server, just like the dynamic SQL version behaves. The LINQ query would then look like this:

// LINQ to SQL using a parameterized stored procedure
var q =
    from currency in ctx.SelectCurrencies("B%")
    select currency;

Performance problem solved, but this solution definitely begs the question “where’s the WHERE?” – in the stored procedure, or in the LINQ query? It needs to be in the stored procedure to prevent unneeded rows from being returned across the network, but then it’s not in the LINQ query any more. So LINQ queries that you need optimized for stored procedures won’t have a where clause in them, and in my humble opinion, that seriously undermines the effectiveness and expressiveness of LINQ queries because the query is now far less “language-integrated”.

Revisiting the Big Debate

Clearly LINQ to SQL and Entity Framework want us to embrace hitting the database server with dynamic SQL, but many database professionals live by the creed “dynamic SQL is the devil, and thou shalt only use stored procedures.” So let’s re-visit the heated “dynamic SQL vs. stored procedure” debate.

Proponents of stored procedures cite the following primary reasons against using dynamic SQL:

1) Security: Your application becomes vulnerable to SQL injection attacks.

2) Performance: Dynamic SQL doesn’t get compiled like stored procedures do, so it’s slower.

3) Maintainability: Spaghetti code results, as server-side T-SQL code is then tightly coupled and interwoven with client-side C# or VB .NET.

Let’s address these concerns:

1) Security: Vulnerability to SQL injection attacks results from building T-SQL statements using string concatenation. Even stored procedures are vulnerable in this respect, if they generate dynamic SQL by concatenating strings. The primary line of defense against SQL injection attacks it to parameterize the query, which is easily done with dynamic SQL. That is, instead of concatenating strings to build the T-SQL, compose a single string that has one or more parameters, and then populate the Parameters collection of the SqlCommand object with the parameter values (this is how the L2S query above prepared the command for SQL Server, as evidenced by the output of the generated T-SQL that uses the parameter @p0 in the WHERE clause).

2) Performance: You’re a little behind the times on this one. It’s true that stored procedures would get partially compiled to speed multiple executions in SQL Server versions 6.5 (released in 1996) and earlier. But as of SQL Server 7.0 (released in 1999), that is no longer the case. Instead, SQL Server 7.0 (and later) compiles and caches the query execution plan to speed multiple executions of the same query (where only parameter values vary), and that’s true whether executing a stored procedure or a SQL statement built with dynamic SQL.

3) Maintainability: This remains a concern if you are embedding T-SQL directly in your .NET code. But with LINQ, that’s not the case because you’re only composing a LINQ query (designed to be part of your .NET code); the translation to T-SQL occurs on the fly at runtime when you execute your application.

Making a Good Compromise

These facts should change your perspective somewhat. But if you’re a die-hard stored procedure proponent that finds it hard to change your ways (like me), consider this compromise: Allow dynamic SQL for SELECT only, but continue using stored procedures for INSERT, UPDATE, and DELETE (which can be imported into your data model just like a SELECT stored procedure can). This is a good strategy because LINQ queries only generate SELECT statements to retrieve data. They can’t update data. Only the SubmitChanges method on the L2S DataContext (or the SaveChanges method on the L2E ObjectContext) generates commands for updating data, with no downside to using stored procedures over dynamic SQL like there is for SELECT.

So you can (and should) stick to using stored procedures for INSERT, UPDATE, and DELETE operations, while denying direct access to the underlying tables (except for SELECT). Doing so allows you to continue using stored procedures to perform additional validation that cannot be bypassed by circumventing the application layer and communicating directly with the database server.