To Use or Not To Use DataSets

That has been the subject of great debate since the dawn of .NET, and is now even more debatable with the availability of Entity Framework. Some developers have dismissed DataSets out of hand long ago, primarily because—despite their ability to be strongly typed and to encapsulate business logic—they are not true business objects. For example, you need to navigate through relationship objects in the DataSet to connect between parent and child rows. This is not intuitive to object oriented programmers, who think of parent child relationships in simpler terms; each parent has as a child collection property and each child has a parent property. Furthermore, the DataSet paradigm does not allow for inheritance, which is also extremely important to object-oriented developers. Null values in a DataSet also require special handling.

Notwithstanding these concerns, I don’t generally advocate dismissing any technology out of hand. Every application is different, and you are doing yourself a disservice if you don’t examine the facets of all available choices on a case by case basis. Like anything else, DataSets can be used or they can be abused, and it’s true that they do present limitations if you try to use them as business objects.But if what you need is a generic in-memory database model, then that’s what a DataSet is, and that’s what a DataSet gives you. While the aforementioned concerns are all valid, the fact remains that DataSets are very powerful and can serve extremely well as data transfer objects. Furthermore, their unique ability to dynamically adapt their shape according to the schema of whatever data you stream into them is a capability that none of the newer APIs provide.

So to be clear, DataSets are not obsolete. The DataSet is a cornerstone of the .NET framework, and it is not going away. In fact, even as late as .NET 3.5 when LINQ to SQL and Entity Framework were first released, Microsoft has been enhancing DataSets. For example, the TableAdapterManager was added to greatly simplify hierarchical updates. The fact remains that DataSets do still have their place, and you are very likely to encounter them for a long time to come as you maintain existing applications.

Having said that, let me also stress that Microsoft has clearly positioned ADO.NET Entity Framework as the preferred data access technology today and in the future, eclipsing both DataSets and LINQ to SQL. Furthermore, Silverlight implements only a subset of the .NET Framework, and conspicuously absent from that subset is the DataSet. Although there are several third-party vendors that provide a DataSet object for Silverlight, this omission on the part of Microsoft is another indication that DataSets are discouraged for new development.

Advertisements

Streaming Into LINQ to XML Using C# Custom Iterators and XmlReader

In this post, I’ll show you how to create LINQ to XML queries that don’t require you to first load and cache XML content into the in-memory LINQ to XML DOM (that is, without first populating an XDocument or XElement query source), but instead operate against an input stream implemented with a C# custom iterator method and an old-fashioned XmlReader object.

LINQ to XML

LINQ to XML, introduced with the .NET Framework 3.5, is a huge win for developers working with XML in any shape or form. Whether XML is being queried, parsed, or transformed, LINQ to XML can almost always be used as an easier alternative to previous technologies that are based on XML-specific languages (XPath, XQuery, XSLT).

At the center of the LINQ to XML stage lies a new DOM for caching XML data in memory. This object model, based on either a root XDocument object or independent XElement objects, represents a major improvement over the older XmlDocument-based DOM in numerous ways (details of which will serve as the topic for a future post). In terms of querying, the XDocument and XElement objects provide methods (such as Descendants) that expose collections of nodes which can be iterated by a LINQ to XML query.

Consuming Sequences with LINQ

A common misconception by many developers learning LINQ is that LINQ only consumes collections. That is not surprising, since one of first benefits developers come to understand is that LINQ can be used to easily query an input collection without coding a foreach loop. While this is certainly true, it’s a very limited view of what can really be accomplished with LINQ.

The broader view is that LINQ works against sequences, which are often — but certainly not always — collections. A sequence can be generated by any object that implements IEnumerable<T> or any method that returns an IEnumerable<T>. Such objects and methods provide iterators that LINQ calls to retrieve one element after another from the source sequence being queried. For a collection, the enumerator simply walks the elements of the collection and returns them one at a time. But you can create your own class that implements IEnumerable<T> or your own custom iterator method which serves as the enumerator that returns a sequence based on something other than a collection.

The point? LINQ is not limited to querying in-memory collections. It can be used to query any sequence, which is fed to the LINQ query by enumerator methods exposed by all classes that implement IEnumerable<T>. What this means is that you don’t necessarily need to load an entire XML document into an in-memory cache before you can query it using LINQ to XML. If you are querying very large documents in order to extract just a few elements of interest, you can achieve better performance by having your LINQ to XML query stream through the XML—without ever caching the XML in memory.

Querying Cached XML vs. Streamed XML

How do you write LINQ to XML queries that consume a read-only, forward-only input stream instead of a pre-populated in-memory cache? Easy. Refer to a custom iterator method instead of a pre-populated XDocument or XElement in the from clause of your query.

For example, consider the following query:

var xd = XDocument.Load("Customers.xml");
var domQuery =
  from c in xd.Descendants("Customer")
  where (string)c.Attribute("Country") == "UK"
  select c;

Now compare that query with this version:

var streamQuery =
  from c in StreamElements("Customers.xml", "Customer")
  where (string)c.Attribute("Country") == "UK"
  select c;

Both versions query over the same XML and produce the same output result (a sequence of customer nodes from the UK), but they consume their input sequences in completely differently ways. The first version first loads the XML content into an XDocument (in-memory cache), and then queries the collection of nodes returned by the Descendants method (all <Customer> nodes) for those in the UK. The larger the XML document being queried, the more memory is consumed by this approach. The second version queries directly against an input stream that feeds the XML content as a sequence using a custom iterator method named StreamElements. This version will consume no more memory for querying a huge XML file than it will for a tiny one. Here’s the implementation of the custom iterator method:

private static IEnumerable<XElement> StreamElements(
  string fileName,
  string elementName)
{
  using (var rdr = XmlReader.Create(fileName))
  {
    rdr.MoveToContent();
    while (rdr.Read())
    {
      if ((rdr.NodeType == XmlNodeType.Element) && (rdr.Name == elementName))
      {
        var e = XElement.ReadFrom(rdr) as XElement;
        yield return e;
      }
    }
    rdr.Close();
  }
}

Understanding C# Custom Iterators

By definition, this method is a custom iterator since it returns an IEnumerable<T> and has a yield return statement in it. By returning IEnumerable<XElement> specifically (just like the Descendants method in the cached DOM version of the query does), this custom iterator is suitable as the source for a LINQ to XML query. The implementation of this method is a beautiful demonstration of how seamlessly new technology (LINQ to XML) integrates with old technology (the XmlReader object has been around since .NET 1.0). Let’s examine the method piece by piece to understand exactly how it works.

When the query first begins to execute, and LINQ needs to start scanning the input sequence, the StreamElements custom iterator method is called with two parameters. The first parameter is the name of the input XML file (“Customers.xml”) and the second parameter is the element of interest to be queried (“Customer”, for each <Customer> node in the XML file).  The method then opens an “old-fashioned” XmlReader against the XML file and advances the stream to the beginning of its content by invoking MoveToContent. It then enters a loop that reads from the stream one element at a time. Each element is tested against the second parameter (“Customer”, in this example). Each matching element (which would be all <Customer> elements) is converted into in XElement object by invoking the static XElement.ReadFrom method against the reader. The XElement object representing the matching node is then yield returned to the LINQ query.

As the query continues to execute, and LINQ needs to continue scanning the input sequence for additional elements, the StreamElements custom iterator method continues execution right after the point at which it yield returned the previous element, rather than entering at the top of the method like an ordinary method would. In this manner, the input stream advances and returns one matching XElement after another while the LINQ query consumes that sequence and filters for UK to produce a new output sequence with the results. When the end of the input stream is reached, the reader’s Read method will return false, which will end the loop, close the reader, and finally exit the method. When the custom iterator method exits, that signals the LINQ query that there are no more input elements in the sequence and the query completes execution at that point.

One important point that may seem obvious but is worth calling out anyway, is that running streaming queries more than once results in reading through the entire stream each time. So to best apply this technique, you should try to extract everything you need from the XML content in one pass (that is, with one LINQ query). If it turns out that you need to query over the stream multiple times, you’ll need to reconsider matters to determine if you aren’t better off caching the XML content once and then querying over the in-memory cache multiple times.

Custom iterators are a very powerful C# language feature, not necessarily limited for use with LINQ. Streaming input into LINQ to XML queries as an alternative to using a cached input source is just one example, but there are numerous others ways to leverage custom iterators. To learn how they can be used with SQL Server 2008 Table-Valued Parameters to marshal an entire business object collection to a stored procedure in a single round-trip, view my earlier blog post that explains the details: SQL Server 2008 Table-Valued Parameters and C# Custom Iterators: A Match Made In Heaven!

Rethinking the Dynamic SQL vs. Stored Procedure Debate with LINQ

Dynamic SQL is evil, right? We should only be using stored procedures to access our tables, right? That will protect us from SQL injection attacks and boost performance because stored procedures are parameterized and compiled, right?

Well think again! The landscape of this debate has changed dramatically, especially with the advent of Language-Integrated Query (LINQ) technologies against relational databases, such as LINQ to SQL (L2S) against SQL Server and LINQ to Entities (L2E) against the Entity Framework’s (EF) Entity Data Model (EDM) over any RDBMS with an ADO.NET provider for EF. This is because, despite the fact that both these ORM technologies support the use of stored procedures, their real intended use is to generate dynamic SQL.

For simplicity, the examples below only use LINQ to SQL, but the same principles apply to LINQ to Entities.

Comparing Dynamic SQL and Stored Procedures in LINQ Queries

Let’s compare the behavior of LINQ queries that generate dynamic SQL with LINQ queries that invoke stored procedures. In our example, we’ll use the Sales.Currency table in the AdventureWorks 2008 database that contains a complete list of global currency and exchange rates. The following L2S query generates dynamic SQL to select just those currencies with codes beginning with the letter B:

// LINQ to SQL query using dynamic SQL
var q =
    from currency in ctx.Currencies
    where currency.CurrencyCode.StartsWith("B")
    select currency;

The ctx variable is the L2S DataContext, and its Currencies property is a collection mapped to the Sales.Currency table in the database (it’s common practice to singularize table names and pluralize collection names).

Modifying this query to use a stored procedure is easy. Assuming we create a stored procedure named SelectCurrencies that executes SELECT * FROM Sales.Currency and then import that stored procedure into either our L2S (.dbml) or EF (.edmx) model, the LINQ query requires only a slight modification to have it call the stored procedure instead of generating dynamic SQL against the underlying table:

// LINQ to SQL using a stored procedure
var q =
    from currency in ctx.SelectCurrencies()
    where currency.CurrencyCode.StartsWith("B")
    select currency;

The only change made to the query is the substitution of the SelectCurrencies method (which is a function that maps to the stored procedure we imported into the data model) for the Currencies property (which is a collection that maps directly to the underlying Sales.Currency table). 

Examining the Generated SQL

There is a major performance problem with this new version of the query, however, which may not be immediately apparent. To understand, take a look at the generated SQL for both queries:

Query 1 (dynamic SQL):

SELECT [t0].[CurrencyCode], [t0].[Name], [t0].[ModifiedDate]
FROM [Sales].[Currency] AS [t0]
WHERE ([t0].[CurrencyCode] LIKE @p0)
-- @p0: Input NVarChar (Size = 2; Prec = 0; Scale = 0) [B%]

Query 2 (stored procedure):

EXEC @RETURN_VALUE = [dbo].[SelectCurrencies]
-- @RETURN_VALUE: Output Int (Size = 0; Prec = 0; Scale = 0) [Null]

Now the problem should be blatantly obvious. The first query executes the filter condition for currencies with codes that start with the letter B in the WHERE clause of the SELECT statement on the database server (by setting parameter @p0 to ‘B%’ and testing with LIKE in the WHERE clause). Only results of interest are returned to the client across the network. The second query executes the SelectCurrencies stored procedure which returns the entire table to the client across the network. Only then does it get filtered by the where clause of the LINQ query to reduce that resultset and obtain only currencies with codes that start with B, while all the other (“non-B”) rows that just needlessly traversed the network from SQL Server are immediately discarded. That clearly amounts to wasted processing, and is a serious performance penalty for the use of stored procedures with LINQ.

Of course, one obvious solution to this problem is to modify the SelectCurrencies stored procedure to accept a @CurrencyCodeFilter parameter and change its SELECT statement to test against that parameter as follows: SELECT * FROM Sales.Currency WHERE CurrencyCode LIKE @CurrencyCodeFilter. That will ensure that only the rows of interest are returned from the server, just like the dynamic SQL version behaves. The LINQ query would then look like this:

// LINQ to SQL using a parameterized stored procedure
var q =
    from currency in ctx.SelectCurrencies("B%")
    select currency;

Performance problem solved, but this solution definitely begs the question “where’s the WHERE?” – in the stored procedure, or in the LINQ query? It needs to be in the stored procedure to prevent unneeded rows from being returned across the network, but then it’s not in the LINQ query any more. So LINQ queries that you need optimized for stored procedures won’t have a where clause in them, and in my humble opinion, that seriously undermines the effectiveness and expressiveness of LINQ queries because the query is now far less “language-integrated”.

Revisiting the Big Debate

Clearly LINQ to SQL and Entity Framework want us to embrace hitting the database server with dynamic SQL, but many database professionals live by the creed “dynamic SQL is the devil, and thou shalt only use stored procedures.” So let’s re-visit the heated “dynamic SQL vs. stored procedure” debate.

Proponents of stored procedures cite the following primary reasons against using dynamic SQL:

1) Security: Your application becomes vulnerable to SQL injection attacks.

2) Performance: Dynamic SQL doesn’t get compiled like stored procedures do, so it’s slower.

3) Maintainability: Spaghetti code results, as server-side T-SQL code is then tightly coupled and interwoven with client-side C# or VB .NET.

Let’s address these concerns:

1) Security: Vulnerability to SQL injection attacks results from building T-SQL statements using string concatenation. Even stored procedures are vulnerable in this respect, if they generate dynamic SQL by concatenating strings. The primary line of defense against SQL injection attacks it to parameterize the query, which is easily done with dynamic SQL. That is, instead of concatenating strings to build the T-SQL, compose a single string that has one or more parameters, and then populate the Parameters collection of the SqlCommand object with the parameter values (this is how the L2S query above prepared the command for SQL Server, as evidenced by the output of the generated T-SQL that uses the parameter @p0 in the WHERE clause).

2) Performance: You’re a little behind the times on this one. It’s true that stored procedures would get partially compiled to speed multiple executions in SQL Server versions 6.5 (released in 1996) and earlier. But as of SQL Server 7.0 (released in 1999), that is no longer the case. Instead, SQL Server 7.0 (and later) compiles and caches the query execution plan to speed multiple executions of the same query (where only parameter values vary), and that’s true whether executing a stored procedure or a SQL statement built with dynamic SQL.

3) Maintainability: This remains a concern if you are embedding T-SQL directly in your .NET code. But with LINQ, that’s not the case because you’re only composing a LINQ query (designed to be part of your .NET code); the translation to T-SQL occurs on the fly at runtime when you execute your application.

Making a Good Compromise

These facts should change your perspective somewhat. But if you’re a die-hard stored procedure proponent that finds it hard to change your ways (like me), consider this compromise: Allow dynamic SQL for SELECT only, but continue using stored procedures for INSERT, UPDATE, and DELETE (which can be imported into your data model just like a SELECT stored procedure can). This is a good strategy because LINQ queries only generate SELECT statements to retrieve data. They can’t update data. Only the SubmitChanges method on the L2S DataContext (or the SaveChanges method on the L2E ObjectContext) generates commands for updating data, with no downside to using stored procedures over dynamic SQL like there is for SELECT.

So you can (and should) stick to using stored procedures for INSERT, UPDATE, and DELETE operations, while denying direct access to the underlying tables (except for SELECT). Doing so allows you to continue using stored procedures to perform additional validation that cannot be bypassed by circumventing the application layer and communicating directly with the database server.

Understanding Generic Delegates and Lambda Expressions with LINQ Extension Methods

Generic delegates and lambda expressions underpin LINQ. If you’ve been struggling to get your arms around these new concepts, this post should help demystify matters for you.

Of course, there are a host of other new .NET language features that enable LINQ as well. These include extension methods, type inference, anonymous types, object initializers, and custom iterators, just to name a few! In this post, I’ll address generic delegates and lambda expressions as they pertain to LINQ (that is, as they pertain to the extension methods provided in .NET 3.5 to support LINQ), and I’ll cover some of the other new language features in future posts.

LINQ Query Syntax

There is simply no LINQ without generic delegates and lambda expressions, yet it would seem at first that you can write simple LINQ queries without seeing or touching either a generic delegate or a lambda expression. Here’s a simple example that queries from a collection of customers in custs. In the where clause, we use a boolean allCustomers variable to control whether all customers or only USA customers will be returned.

var allCustomers = false;

var q =
from c in custs
where c.Country == "USA" || allCustomers == true
orderby c.FirstName
select c;

Where’s the generic delegate and the lambda expression in this LINQ query? Oh they’re there all right, but the LINQ query syntax hides those details from the code.

LINQ Method Call Syntax

In reality, the compiler (C# in this case, but VB .NET as well) is supporting a sugar-coated syntax that is strikingly similar to an ordinary SQL query in the RDBMS world, but which really amounts to a chained set of method calls.

What this means is that there is really no such thing as a where clause in LINQ–it’s a Where extension method call.

And guess what? There’s no such thing as an orderby clause either–it’s an OrderBy extension method call, chained to the end of the Where method call.

Meaning that the previous query is treated by the compiler exactly as if you had coded it using LINQ method call syntax as follows:

var q = custs
.Where(c => c.Country == "USA" || allCustomers == true)
.OrderBy(c => c.FirstName);

So this code looks more “conventional,” eh? We’re invoking the Where method on the collection of customers, and then invoking the OrderBy method on the result returned by Where, and obtaining our final filtered and sorted results in q.

LINQ query syntax is obviously cleaner than LINQ method call syntax, but it’s important to understand that only LINQ method call syntax fully implements the complete set of LINQ Standard Query Operators (SQOs), so that you will sometimes be forced to use method call syntax. Also realize that it’s common and normal to use combinations of the two in a single query (I’ll show examples of this using .Concat and other methods in future posts).

Two questions arise at this point:

1) How did the customer collection, which is a List<Customer> in our example, get a Where and OrderBy method?

2) What is the => operator in the method parameters, and what does it do (and how the heck do you pronounce it)?

The answer to the first question is simple: Extension methods. I’ll cover those in more detail in a future post, but for now just understand that the .NET framework 3.5 “extends” any class that implements IEnumerable<T> (virtually any collection, such as our List<Customer>) with the Where and OrderBy methods, as well as many other methods that enable LINQ.

Generic Delegates

The second question is much juicier. First, the => is the lambda operator, and many people verbalize it as ‘goes to’. So ‘c => …’ is read aloud as ‘c goes to…’

To understand lambdas, you need to understand generic delegates and anonymous methods. And to best understand those is to review delegates and generics themselves. If you’ve been programming .NET for any length of time, you’re certain to have encountered delegates. And as of .NET 2.0 (VS 2005), generics have been an important language feature for OOP as well.

A delegate is a data type that defines the “shape” (signature) of a method call. Once you have defined a delegate, you can declare a variable as that delegate. At runtime, the variable can then be pointed to (and then invoke) any available method that matches the signature defined by the delegate. EventHandler is a typical example of a delegate defined by the .NET framework. Here is how .NET defines the EventHandler delegate:

public delegate void EventHandler(object sender, EventArgs e);

This delegate is suitable for pointing to any method that returns no value (void), and accepts two parameters (object sender and EventArgs e); in other words, a typical event handler that has no special event arguments to be passed.

A generic delegate is nothing more than an ordinary delegate, but supports generic type declarations for achieving strongly-typed definitions of the delegate at compile time. This is the same way generics are used for classes. For example, just as the framework provides a List<T> class so that you can have a List of anything, it now also provides a small set of generic delegates named Func that can be used to stongly type many different method signatures; essentially, any method that takes from zero to 4 parameters of any type (T1 through T4) and returns an object of any type (TResult):

public delegate TResult Func<TResult>();
public delegate TResult Func<T, TResult>(T arg);
public delegate TResult Func<T1, T2, TResult>(T1 arg1, T2 arg2);
public delegate TResult Func<T1, T2, T3, TResult>(T1 arg1, T2 arg2, T3 arg3);
public delegate TResult Func<T1, T2, T3, T4, TResult>(T1 arg1, T2 arg2, T3 arg3, T4 arg4);

Delegates as Parameters

With these generic Func delegates baked into the .NET framework, LINQ queries can use them as parameters to extension methods such as Where and OrderBy. Let’s examine one overload of the Where extension method:

public static IEnumerable<TSource> Where<TSource>(
  this IEnumerable<TSource> source,
  Func<TSource, bool> predicate);

What does this mean? Let’s break it down. And to better suit our particular example and simplify the explanation, let’s substitute Customer for TSource:

1) It’s a method called Where<Customer>.

2) It returns an IEnumerable<Customer> sequence.

3) It’s an extension method available to operate on any IEnumerable<Customer>instance (as denoted by the special this keyword in the signature), such as the List<Customer> in our example. The this keyword used in this context means that an extension method for the type following the this keyword is being defined, rather than definining what looks like the first parameter to the method.

4) It takes a single predicate parameter of type Func<Customer, bool>.

Number 4 is key. Func<Customer, bool> is the second Func generic delegate shown above, Func<T, TResult>, where T is Customer and TResult is bool (Boolean). The Func<Customer, bool> delegate refers to any method that accepts a Customer object and returns a bool result.  This means that the Where<Customer> method takes a delegate (function pointer) that points to another method which actually implements the filtering logic. This other method will receive each Customer object as the LINQ query iterates the source sequence List<Customer>, apply the desired filtering criteria on it, and return a true or false result in the bool return value that controls whether this particular Customer object should be selected by the query.

How does all that get expressed simply with: .Where(c => c.Country == “USA” || allCustomers == true)?

The best way to answer that question is to first demonstrate two other ways of invoking the Where method. The first is to explicitly provide a delegate instance of Func<Customer, bool> that points to another method called IsCustomerCountryUSA:

private void MainMethod()
{
    var filterMethod = new Func<Customer, bool>(IsCustomerCountryUSA);
    var q = custs.Where(filterMethod);
}

private bool IsCustomerCountryUSA(Customer c)
{
return (c.Country == "USA");
}

The Where method takes a single parameter, which is a new instance of the Func<TSource, T> generic delegate that points to a method named IsCustomerCountryUSA. This works because IsCustomerCountryUSA accepts a Customer object and returns a bool result, which is the signature defined by Func<Customer, bool> expected as a parameter by Where<Customer>.  Inside the IsCustomerCountryUSA, filtering logic is applied. Note that because we have created IsCustomerCountryUSA as a completely separate method, it cannot access the allCustomers variable defined locally in the calling method (the variable that controls whether all or only USA customers should be selected, as shown at the beginning of this post). Furthermore, because the signature of the IsCustomerCountryUSA method is fixed as determined by Func<Customer, bool>, we can’t just pass the allCustomers variable along as another parameter either.  Thus, this approach would require you to define the allCustomers variable as a private member if you wanted to include it in the filtering logic of the IsCustomerCountryUSA method.

Anonymous Methods

Anonymous methods were introduced at the same time as generics in .NET 2.0 (VS 2005). They are useful in cases such as this, where the method we’re creating for the delegate only needs to be called from one place, so the entire method body is simply coded in-line. Since it’s coded in-line at the one and only place it’s invoked, the method doesn’t need to be given a name, hence the term anonymous method.

To refactor the delegate instance version of the code to use an anonymous method, replace the Func parameter passed to the Where method with the actual method signature and body as follows:

var q = custs.Where(delegate(Customer c) { return c.Country == "USA" || allCustomers == true; });

The keyword delegate indicates you’re pointing the delegate parameter Func<Customer, bool> expected by the Where method to an anonymous method. The anonymous method isn’t named, as mentioned, but it must match the Func<Customer, bool> signature, meaning it must accept a Customer object parameter and return a bool result. The explicit signature following the delegate keyword defines the expected single Customer object parameter, and the expected bool result is inferred from the fact that the method returns an expression that resolves to a bool value. If the anonymous method used any other signature or returned anything other than a bool result, the compiler would fail to build your code  because the Where<Customer> method is defined to accept only a Func<Customer, bool> delegate.

So now the method body containing the filtering logic that was formerly coded in the separate IsCustomerCountryUSA method appears in braces right after the anonymous method signature. But there’s another huge advantage here besides saving the overhead of declaring a separate method for the filtering logic. Because anonymous methods appear inside of the method that calls them, they magically gain access to the local variables of the calling method. Thus, this implementation can also test the allCustomers variable, even though it’s defined in the calling method which would normally be inaccessible to the method being called. Because of this feature, the allCustomers variable defined as a local variable in the calling method can be tested by the anonymous method as part of the filtering criteria. That means it’s not necessary to define private fields to share local variables defined in a method with an anonymous method that the method calls, as demonstrated here). The result is less code, and cleaner code (though you should be aware that the compiler pulls some fancy tricks to make this possible, and there is a degree of additional runtime overhead involved when accessing local variables of the calling method from the anonymous method being called).

Lambda Expressions

Lambdas were introduced in .NET 3.5 (VS 2008) as a more elegant way of implementing anonymous methods, particularly with LINQ. When a lambda is used to represent an anonymous method containing a single expression which returns a single result, that anonymous method is called a lambda expression. Let’s first refactor our query by converting it from an anonymous method to a lambda:

var q = custs.Where((Customer c) => { return c.Country == "USA" || allCustomers == true; });

All we did was drop the delegate keyword and add the => (“goes to”) operator. The rest looks and works the same as before.

This lambda consists of a single expression that returns a single result, which qualifies it to be refactored as a lambda expression like this:

var q = custs.Where(c => c.Country == "USA" || allCustomers == true);

This final version again works exactly the same way, but has become very terse. First, the Customer data type in the signature isn’t needed, since it’s inferred as the TSource in Func<TSource, bool>. The parentheses surrounding the signature are also not needed, so they’re dropped too. Secondly, since lambda expressions by definition consist of only a single statement, the opening and closing braces for the method body aren’t needed and also get dropped. Finally, since lambda expressions by definition return a single result, the return keyword isn’t needed and is also dropped.

Besides offering an even terser syntax, lambda expressions provide two important additional benefits over anonymous methods. First, as just demonstrated, the data type of the parameter in the signature needn’t be specified since it’s inferred automatically by the compiler. That means that anonymous types (such as those often created by projections defined in the select clause of a LINQ query) can also be passed to lambda expressions. Anonymous types cannot be passed to any other type of method, so this capability was absolutely required to support anonymous types with LINQ. Second, lambda expressions can be used to build expression trees at runtime, which essentially means that your query code gets converted into query data which can then be processed by a particular LINQ implementation at runtime. For example, LINQ to SQL builds an expression tree from the lambda expressions in your query, and generates the appropriate T-SQL for querying SQL Server from the expression tree. Future posts will cover anonymous types and expression trees in greater detail.

To Sum It All Up

So this final lambda expression version invokes the Where method on the List<Customer> collection in custs. Because List<Customer> implements IEnumerable<T>, the Where extension method which is defined for any IEnumerable<T> is available to be invoked on it. The Where<Customer> method expects a single parameter of the generic delegate type Func<Customer, bool>. Such a parameter can be satisfied by the in-line lambda expression shown above, which accepts a Customer object as c =>, and returns a bool result by applying a filtering condition on the customer’s Country property and the calling method’s allCustomers variable.

And as expained at the beginning of this post, the method syntax query using our final lambda expression is exactly what the compiler would generate for us if we expressed it using this query syntax:

var q =
    from c in custs
    where c.Country == "USA" || allCustomers == true
    select c;

I hope this helps clear up any confusion there is surrounding lambda expressions in .NET 3.5.

Happy coding!