Mapping the uncharted waters of LINQ

by Tobias Hertkorn on March 11th, 2009

Preamble

Today I had a really interesting discussion with Jon Skeet on stackoverflow.

It got started by a question by me: When does a library deserve the be called “Linq-something” or “something-Linq”?

In it I kinda rant about libraries that have "Linq" in their name even though they do not provide extension methods for IQueryable and IEnumerable but instead just for IEnumberable. And I asked the community if it is OK for libraries that just extend IEnumberable to have Linq in their name.

Jon Skeet's (who is kind of a stackoverflow celebrity) answer made me wonder though. What are the classifications for Linq related libraries? Especially since he seemed to mix and match Linq-to-... libraries with his MoreLinq library. In the comments (Click "Comments" to see them) I tried to clarify why I thought there was a difference but unfortunately I was not clear enough to get my point across in 300 characters. That's why I chose to write this blog post. Hope it will clarify my view on what parts LINQ is made of.

Classifying the moving parts of LINQ

First I want to clarify what I think that comparing a Linq-to-... library with MoreLinq is kinda flawed. Linq-to-... are what I (and others) call a Linq Provider library and something like MoreLinq is a Query Operators library.

Linq Provider libraries

To quote from the .NET Framework Glossary:

LINQ provider
A library that implements the functionality of the standard query operators for a specific kind of data source. A LINQ provider translates queries expressed in Visual Basic or C# into the native language of the data source. For example, LINQ to SQL is a LINQ provider for SQL Server databases that translates queries into dynamic SQL and marshals the results into objects defined on the client computer.

LINQ Provider are the base of LINQ. They connect the logical description of a data transformation to the actual data storage (which does not have to be a relational database but may be). Loosely speaking they can transform a "persisted" form of data or an in-memory representation of data into an alternative in-memory representation of data. We can give those providers abstract transformation rules and the provider will convert them into concrete transformation rules specific for the kind of data store it "owns". Abstract transformation they individually do not understand will get deferred until a rudimentary memory representation is formed and will be applied in memory to form the final in-memory representation of data.

A typical Linq Provider library will contain:

  • An implementation of IQueryProvider
  • At least one implementation of IQueryable
  • Usually one or multiple ExpressionVisitor
  • Usually one or multiple ExpressionConverter

Examples of Linq Provider libraries and their concrete IQueryProvider implementation:

  • Linq to Sql (System.Data.Linq.Table<TEntity>)
  • Linq to Nhibernate (NHibernate.Linq.NHibernateQueryProvider)
  • Linq to Objects (System.Linq.EnumerableQuery<T>)
Linq Operator libraries

Linq Operator libraries provide a rich set of abstract operations that can be used to construct the abstract transformation rules we can pass to any LINQ Provider. Most notably are the Standard Linq Operations actually baked into the C# 3.5 compiler as syntactical sugar.

Some examples for these operations:

  • Where
  • GroupBy
  • Sum
  • Max
  • ...

To be able to be used across multiple Linq Provider and let the provider benefit from deep inspection of the abstract transformations the Linq Operators must extend IQueryable. Unfortunately the parsing and compiling of these abstract transformation expressions is CPU intensive and can severely impact the performance of in-memory transformations.
That's where LINQ to Objects comes (unfortunately) into play a second time. In this context Linq to Objects means using abstract Linq queries directly on IEnumerable<T> without the intermediate use of a Linq Provider.

Because of these two forms of abstract transformations, one with Linq Provider, one without, good Linq Operation libraries will provide both an implementation of the operation for IQueryable and for IEnumerable. Remember though - IEnumerable<T> could still be queried using the Linq to Objects Provider EnumerableQuery<T>. Fortunately writing Linq Operations for IEnumerable is by far easier than implementing the same operation for IQueryable.

The Linq Operator library FilterByExample

To clarify what the difference is we will implement the Linq Operation "FilterByExample" both for IEnumerable and IQueryable. FilterByExample will enable us to filter a given data set via a provided instance of the class within the data set. FilterByExample will allow us to specify certain properties of the given class. This is useful for quickly creating advanced filtering forms in GUI development.

Download source for Com.Hertkorn.Framework.FilterByExample

Most of the magic happens in the method FilterByExample in Queryable (and Enumerable almost in the same way). Please notice that in this private version of FilterByExample relevant properties are listed. The public version gets properties that should be excluded (look at the unit tests for examples).

C#:
  1. private static IQueryable<T> FilterByExample<T>(this IQueryable<T> source, T example, params PropertyInfo[] relevantPropertyz)
  2. {
  3.   // since this method is private no additional precondition check.
  4.  
  5.   if (relevantPropertyz.Length == 0)
  6.   {
  7.     // no filtering
  8.     return source;
  9.   }
  10.  
  11.   var exampleExpression = Expression.Constant(example, typeof(T));
  12.  
  13.   var otherParameterExpression = Expression.Parameter(typeof(T), "other");
  14.  
  15.   var filterExpression = Expression.Equal(
  16.     Expression.MakeMemberAccess(otherParameterExpression, relevantPropertyz[0]),
  17.     Expression.MakeMemberAccess(exampleExpression, relevantPropertyz[0])
  18.     );
  19.  
  20.   for (int i = 1; i <relevantPropertyz.Length; i++)
  21.   {
  22.     var next = Expression.Equal(
  23.       Expression.MakeMemberAccess(otherParameterExpression, relevantPropertyz[i]),
  24.       Expression.MakeMemberAccess(exampleExpression, relevantPropertyz[i])
  25.       );
  26.  
  27.     filterExpression = Expression.AndAlso(filterExpression, next);
  28.   }
  29.  
  30.   Expression<Func<T, bool>> filter = Expression.Lambda<Func<T, bool>>(filterExpression, otherParameterExpression);
  31.  
  32.   return source.Where(filter);
  33. }

The summary explanation of this code snippet is: In line 15 through 28 all relevant properties are tested to be of equal value between two instances of an object of type T. These tests are linked via a logical AND in line 27 so that the resulting expression returns true if the value all relevant properties match in the two instances. This expression is passed to a where clause in line 32. So the returned IQueryable is a representation of the data where only object will be contained in the result where the relevant properties have matching values.

Comparing FilterByExample's extension for IQueryable and IEnumerable

The question is now. Why go through the stress of implementing FilterByExample twice. We can't really see any advantages in the library and its tests. On the contrary. The performance tests show that the IQueryable implementation is about three times slower than the IEnumerable implementation. Part of the reason why we don't see any advantage of the IQueryable implementation is that EnumerableQuery does not use an expression optimizer. So for any effect to appear we must switch to a different Linq Provider.

Download LinqToSql Example - If you want to run it make sure to adjust the connection string in the app.config

Using FilterByExample operator with Linq-to-Sql

When using Linq to Sql we can actually see the SQL Query that gets sent to the DB Server based on the abstract transformation we specify using Linq. That way we can verify that the abstract transformation the operator creates is in fact what we were looking for.

A simple query using standard linq operators would look like this:

C#:
  1. IQueryable<TestClass> filter = from t in c.TestClasses
  2.                                where t.TestString == "Test"
  3.                                select t;
  4.  
  5. PerformQuery("standard version", c, filter);

and would result in this output:

Performing standard version:
'1' 'Test' 'Test'
'4' 'Test' 'Me'
'5' 'Test' 'TestMe'
SELECT [t0].[TestInt], [t0].[TestString], [t0].[TestString2]
FROM [TestClass] AS [t0]
WHERE [t0].[TestString] = @p0
-- @p0: Input NVarChar (Size = 4; Prec = 0; Scale = 0) [Test]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

As expected we see a where clause in the select method which shows that the Linq Provider understood our abstract transformation and was able to translate it into SQL. That way not the whole data table "TestClass" gets pulled across the wire. Just the part we need.

Now we use the FilterByExample operator:

C#:
  1. TestClass tc = new TestClass();
  2. tc.TestString = "Test";
  3.  
  4. IQueryable<TestClass> filter2 = c.TestClasses.FilterByExample(tc, x => x.TestInt, x => x.TestString2);
  5.  
  6. PerformQuery("one relevant prop", c, filter2);

This means "filter the TestClass table by tc and ignore the values in TestInt and TestString2.
And it results in:


Performing one relevant prop:
'1' 'Test' 'Test'
'4' 'Test' 'Me'
'5' 'Test' 'TestMe'
SELECT [t0].[TestInt], [t0].[TestString], [t0].[TestString2]
FROM [TestClass] AS [t0]
WHERE [t0].[TestString] = @p0
-- @p0: Input NVarChar (Size = 4; Prec = 0; Scale = 0) [Test]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

Again lo and behold the actual SQL that gets produced looks exactly as if we would have used the standard where query.

Finally a more complex query - filter the list by two relevant properties:

C#:
  1. tc.TestString = "Test";
  2. tc.TestInt = 1;
  3.  
  4. IQueryable<TestClass> filter3 = c.TestClasses.FilterByExample(tc, x => x.TestString2);
  5.  
  6. PerformQuery("two relevant props", c, filter3);

which results in:


Performing two relevant props:
'1' 'Test' 'Test'
SELECT [t0].[TestInt], [t0].[TestString], [t0].[TestString2]
FROM [TestClass] AS [t0]
WHERE ([t0].[TestInt] = @p0) AND ([t0].[TestString] = @p1)
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [1]
-- @p1: Input NVarChar (Size = 4; Prec = 0; Scale = 0) [Test]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

As expected a second parameter further specifies the where clause plus both are linked by an AND. Now just one object got sent across the wire.

Seems like our custom Linq Operator gets interpreted correctly by the SQL Linq Provider and the filter and transformation happens right where it belongs. Within the SQL Server itself. And even though I did not test it, I strongly believe that it would be the same with e.g. the Linq to NHibernate provider.

In contrast the IEnumerable version

Now to show the difference between the IQueryable and the IEnumerable operator we have to force the Linq to Sql to use the IEnumerable operator:

C#:
  1. tc.TestString = "Test";
  2. tc.TestInt = 1;
  3.  
  4. IEnumerable<TestClass> filter4 = FBE.Enumerable.FilterByExample(c.TestClasses, tc, x => x.TestString2);
  5.  
  6. PerformQuery("IEnumerable", c, filter4);

Notice how we don't get an IQueryable but an IEnumerable as result of the operation. Now the SQL reads:


Performing IEnumerable:
'1' 'Test' 'Test'
SELECT [t0].[TestInt], [t0].[TestString], [t0].[TestString2]
FROM [TestClass] AS [t0]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

It is not surprising that the Linq Provider was not able to analyse the abstract transformation we specified. There simply is no expression that could be analysed. So there is no where clause specified in the SQL query. Which results in the Linq Provider sending the whole table across the wire and the filtering being done in-memory. This is of course no biggie in our test scenario. But imagine the table having millions of rows. That certainly would be a big performance problem. Especially if we throw in some joins, etc.

Conclusion

There is a difference between MoreLinq and Linq-to-...

I hopefully did clarify that part.

Real Linq Operators libraries extend IQueryable and IEnumerable

If there is a Linq Operator that is worth putting into a library there have to be two versions of the operator. One for IQueryable to enable smart LINQ Provider to do the data transformation where it fits best. And one for IEnumerable to have the performance benefits when dealing with data that is already in-memory.

No library is going to be able to extend what every provider can do...

... because they would have to know all about all providers.

This is simply wrong. Providers are separate parts in Linq. And Operators certainly do not have to know anything about the provider that will use them. On the contrary, the abstract representation using Expressions is universal. There are just providers that can interpret and optimize those expressions better than other providers. By using Operations that manipulate IQueryable to impose the desired transformation the Linq Provider is given the chance to incorporate the abstract transformation when compiling the concrete transformation. When just using IEnumerable this chance is denied from the very beginning.

Downloads:
Source for Com.Hertkorn.Framework.FilterByExample - the Operator example
LinqToSql Example - using the Operator on Linq to SQL - If you want to run it make sure to adjust the connection string in the app.config

March 11th, 2009 3:48 am | Comments (7)

ParallelFX or the problem of sticking to standards

by Tobias Hertkorn on October 15th, 2007

I have been stumbling over ParallelFX or Parallel LINQ for the last couple of weeks. Just some quick links for those out there not knowing what I am talking about:

Don't get me wrong, I really love it. But this line:

C#:
  1. int sum = Parallel.Aggregate(0, 100, // iteration domain
  2.               0,                     // initial value
  3.               delegate(int i) { return (isPrime(i) ? i : 0) }, // apply
  4.                                                       // on each element
  5.               delegate(int x, int y) { return x+y; }  // combine results
  6.           );

It PAINS me. Ok, call me nit-picky, but why couldn't they stick to some (quasi-) standard. "Construct aided" parallelization has been around for quite a while. Just look up OpenMP. And there this type of operation is called a reduction. So maybe this construct should have been called Parallel.Reduction. Which in my point of view would even be the "cleaner" name, even if there was no OpenMP around. So take that, PFX! Ah, isn't it nice to be a bit cranky and have an innocent target like PLINQ to be the victim of my pointless rage. ;)

No, seriously, I LOVE the idea of having that type of construct around. Maybe they did not come across openmp for naming conventions, maybe they even are a tiny, tiny bit too much in love with their lambda. And maybe I should get over myself and learn to distinguish between OpenMP and PLINQ. ;) Anyway, I can't wait to get my hands on this baby, because it will seriously impact speed, both when executing programs and when writing parallel programs. Love it!

October 15th, 2007 11:19 pm | Comments (2)

BLINQ – an automated ASP.NET website creator

by Tobias Hertkorn on September 20th, 2006

Download it over at www.asp.net's Toolbox.

Blinq is a tool for generating ASP.NET websites for displaying, creating, and manipulating data based on database schema. Just point Blinq at a SQL database and it will create a website with pages that display sorted and paged data, allow you to update or delete records, create new records, and follow relationships between tables in your database. You don't need to write SQL queries to use Blinq; LINQ will generate optimized queries for you that request just the data you want to show. Blinq uses the May LINQ Community Tech Preview to access data. The code Blinq creates is simple and easy to customize to fit your needs. Everything in the website Blinq creates is meant as a starting point for a website that meets your needs perfectly, so have fun customizing the pages, experimenting with the code, and making it yours!

Also watch the creator Polita Paulus over on channel9 explaining her tool.

September 20th, 2006 1:31 pm | Comments (0)

Hejlsberg at it again

by Tobias Hertkorn on September 19th, 2006

There is a new screencast on channel9 featuring Anders Hejlsberg's speach at the Lang.Net 2006 Compiler Symposium.

About 6 minutes is just one slide:

Problem:
Data != Objects

That neatly sums up the topic of this talk. Plus he goes into LINQ again. Enjoy!

September 19th, 2006 1:00 pm | Comments (0)
Tobi + C# = T# - Blogged blogoscoop