Preamble
Today I had a really interesting discussion with Jon Skeet on stackoverflow.
It got started by a question by me: When does a library deserve the be called “Linq-something” or “something-Linq”?
In it I kinda rant about libraries that have "Linq" in their name even though they do not provide extension methods for IQueryable and IEnumerable but instead just for IEnumberable. And I asked the community if it is OK for libraries that just extend IEnumberable to have Linq in their name.
Jon Skeet's (who is kind of a stackoverflow celebrity) answer made me wonder though. What are the classifications for Linq related libraries? Especially since he seemed to mix and match Linq-to-... libraries with his MoreLinq library. In the comments (Click "Comments" to see them) I tried to clarify why I thought there was a difference but unfortunately I was not clear enough to get my point across in 300 characters. That's why I chose to write this blog post. Hope it will clarify my view on what parts LINQ is made of.
Classifying the moving parts of LINQ
First I want to clarify what I think that comparing a Linq-to-... library with MoreLinq is kinda flawed. Linq-to-... are what I (and others) call a Linq Provider library and something like MoreLinq is a Query Operators library.
Linq Provider libraries
To quote from the .NET Framework Glossary:
LINQ provider
A library that implements the functionality of the standard query operators for a specific kind of data source. A LINQ provider translates queries expressed in Visual Basic or C# into the native language of the data source. For example, LINQ to SQL is a LINQ provider for SQL Server databases that translates queries into dynamic SQL and marshals the results into objects defined on the client computer.
LINQ Provider are the base of LINQ. They connect the logical description of a data transformation to the actual data storage (which does not have to be a relational database but may be). Loosely speaking they can transform a "persisted" form of data or an in-memory representation of data into an alternative in-memory representation of data. We can give those providers abstract transformation rules and the provider will convert them into concrete transformation rules specific for the kind of data store it "owns". Abstract transformation they individually do not understand will get deferred until a rudimentary memory representation is formed and will be applied in memory to form the final in-memory representation of data.
A typical Linq Provider library will contain:
- An implementation of IQueryProvider
- At least one implementation of IQueryable
- Usually one or multiple ExpressionVisitor
- Usually one or multiple ExpressionConverter
Examples of Linq Provider libraries and their concrete IQueryProvider implementation:
- Linq to Sql (System.Data.Linq.Table<TEntity>)
- Linq to Nhibernate (NHibernate.Linq.NHibernateQueryProvider)
- Linq to Objects (System.Linq.EnumerableQuery<T>)
Linq Operator libraries
Linq Operator libraries provide a rich set of abstract operations that can be used to construct the abstract transformation rules we can pass to any LINQ Provider. Most notably are the Standard Linq Operations actually baked into the C# 3.5 compiler as syntactical sugar.
Some examples for these operations:
- Where
- GroupBy
- Sum
- Max
- ...
To be able to be used across multiple Linq Provider and let the provider benefit from deep inspection of the abstract transformations the Linq Operators must extend IQueryable. Unfortunately the parsing and compiling of these abstract transformation expressions is CPU intensive and can severely impact the performance of in-memory transformations.
That's where LINQ to Objects comes (unfortunately) into play a second time. In this context Linq to Objects means using abstract Linq queries directly on IEnumerable<T> without the intermediate use of a Linq Provider.
Because of these two forms of abstract transformations, one with Linq Provider, one without, good Linq Operation libraries will provide both an implementation of the operation for IQueryable and for IEnumerable. Remember though - IEnumerable<T> could still be queried using the Linq to Objects Provider EnumerableQuery<T>. Fortunately writing Linq Operations for IEnumerable is by far easier than implementing the same operation for IQueryable.
The Linq Operator library FilterByExample
To clarify what the difference is we will implement the Linq Operation "FilterByExample" both for IEnumerable and IQueryable. FilterByExample will enable us to filter a given data set via a provided instance of the class within the data set. FilterByExample will allow us to specify certain properties of the given class. This is useful for quickly creating advanced filtering forms in GUI development.
Download source for Com.Hertkorn.Framework.FilterByExample
Most of the magic happens in the method FilterByExample in Queryable (and Enumerable almost in the same way). Please notice that in this private version of FilterByExample relevant properties are listed. The public version gets properties that should be excluded (look at the unit tests for examples).
- private static IQueryable<T> FilterByExample<T>(this IQueryable<T> source, T example, params PropertyInfo[] relevantPropertyz)
- {
- // since this method is private no additional precondition check.
- if (relevantPropertyz.Length == 0)
- {
- // no filtering
- return source;
- }
- var filterExpression = Expression.Equal(
- Expression.MakeMemberAccess(otherParameterExpression, relevantPropertyz[0]),
- Expression.MakeMemberAccess(exampleExpression, relevantPropertyz[0])
- );
- for (int i = 1; i <relevantPropertyz.Length; i++)
- {
- var next = Expression.Equal(
- Expression.MakeMemberAccess(otherParameterExpression, relevantPropertyz[i]),
- Expression.MakeMemberAccess(exampleExpression, relevantPropertyz[i])
- );
- filterExpression = Expression.AndAlso(filterExpression, next);
- }
- Expression<Func<T, bool>> filter = Expression.Lambda<Func<T, bool>>(filterExpression, otherParameterExpression);
- return source.Where(filter);
- }
The summary explanation of this code snippet is: In line 15 through 28 all relevant properties are tested to be of equal value between two instances of an object of type T. These tests are linked via a logical AND in line 27 so that the resulting expression returns true if the value all relevant properties match in the two instances. This expression is passed to a where clause in line 32. So the returned IQueryable is a representation of the data where only object will be contained in the result where the relevant properties have matching values.
Comparing FilterByExample's extension for IQueryable and IEnumerable
The question is now. Why go through the stress of implementing FilterByExample twice. We can't really see any advantages in the library and its tests. On the contrary. The performance tests show that the IQueryable implementation is about three times slower than the IEnumerable implementation. Part of the reason why we don't see any advantage of the IQueryable implementation is that EnumerableQuery does not use an expression optimizer. So for any effect to appear we must switch to a different Linq Provider.
Download LinqToSql Example - If you want to run it make sure to adjust the connection string in the app.config
Using FilterByExample operator with Linq-to-Sql
When using Linq to Sql we can actually see the SQL Query that gets sent to the DB Server based on the abstract transformation we specify using Linq. That way we can verify that the abstract transformation the operator creates is in fact what we were looking for.
A simple query using standard linq operators would look like this:
- IQueryable<TestClass> filter = from t in c.TestClasses
- where t.TestString == "Test"
- select t;
- PerformQuery("standard version", c, filter);
and would result in this output:
Performing standard version:
'1' 'Test' 'Test'
'4' 'Test' 'Me'
'5' 'Test' 'TestMe'
SELECT [t0].[TestInt], [t0].[TestString], [t0].[TestString2]
FROM [TestClass] AS [t0]
WHERE [t0].[TestString] = @p0
-- @p0: Input NVarChar (Size = 4; Prec = 0; Scale = 0) [Test]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1
As expected we see a where clause in the select method which shows that the Linq Provider understood our abstract transformation and was able to translate it into SQL. That way not the whole data table "TestClass" gets pulled across the wire. Just the part we need.
Now we use the FilterByExample operator:
- tc.TestString = "Test";
- IQueryable<TestClass> filter2 = c.TestClasses.FilterByExample(tc, x => x.TestInt, x => x.TestString2);
- PerformQuery("one relevant prop", c, filter2);
This means "filter the TestClass table by tc and ignore the values in TestInt and TestString2.
And it results in:
Performing one relevant prop:
'1' 'Test' 'Test'
'4' 'Test' 'Me'
'5' 'Test' 'TestMe'
SELECT [t0].[TestInt], [t0].[TestString], [t0].[TestString2]
FROM [TestClass] AS [t0]
WHERE [t0].[TestString] = @p0
-- @p0: Input NVarChar (Size = 4; Prec = 0; Scale = 0) [Test]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1
Again lo and behold the actual SQL that gets produced looks exactly as if we would have used the standard where query.
Finally a more complex query - filter the list by two relevant properties:
- tc.TestString = "Test";
- tc.TestInt = 1;
- IQueryable<TestClass> filter3 = c.TestClasses.FilterByExample(tc, x => x.TestString2);
- PerformQuery("two relevant props", c, filter3);
which results in:
Performing two relevant props:
'1' 'Test' 'Test'
SELECT [t0].[TestInt], [t0].[TestString], [t0].[TestString2]
FROM [TestClass] AS [t0]
WHERE ([t0].[TestInt] = @p0) AND ([t0].[TestString] = @p1)
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [1]
-- @p1: Input NVarChar (Size = 4; Prec = 0; Scale = 0) [Test]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1
As expected a second parameter further specifies the where clause plus both are linked by an AND. Now just one object got sent across the wire.
Seems like our custom Linq Operator gets interpreted correctly by the SQL Linq Provider and the filter and transformation happens right where it belongs. Within the SQL Server itself. And even though I did not test it, I strongly believe that it would be the same with e.g. the Linq to NHibernate provider.
In contrast the IEnumerable version
Now to show the difference between the IQueryable and the IEnumerable operator we have to force the Linq to Sql to use the IEnumerable operator:
- tc.TestString = "Test";
- tc.TestInt = 1;
- IEnumerable<TestClass> filter4 = FBE.Enumerable.FilterByExample(c.TestClasses, tc, x => x.TestString2);
- PerformQuery("IEnumerable", c, filter4);
Notice how we don't get an IQueryable but an IEnumerable as result of the operation. Now the SQL reads:
Performing IEnumerable:
'1' 'Test' 'Test'
SELECT [t0].[TestInt], [t0].[TestString], [t0].[TestString2]
FROM [TestClass] AS [t0]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1
It is not surprising that the Linq Provider was not able to analyse the abstract transformation we specified. There simply is no expression that could be analysed. So there is no where clause specified in the SQL query. Which results in the Linq Provider sending the whole table across the wire and the filtering being done in-memory. This is of course no biggie in our test scenario. But imagine the table having millions of rows. That certainly would be a big performance problem. Especially if we throw in some joins, etc.
Conclusion
There is a difference between MoreLinq and Linq-to-...
I hopefully did clarify that part.
Real Linq Operators libraries extend IQueryable and IEnumerable
If there is a Linq Operator that is worth putting into a library there have to be two versions of the operator. One for IQueryable to enable smart LINQ Provider to do the data transformation where it fits best. And one for IEnumerable to have the performance benefits when dealing with data that is already in-memory.
No library is going to be able to extend what every provider can do...
... because they would have to know all about all providers.
This is simply wrong. Providers are separate parts in Linq. And Operators certainly do not have to know anything about the provider that will use them. On the contrary, the abstract representation using Expressions is universal. There are just providers that can interpret and optimize those expressions better than other providers. By using Operations that manipulate IQueryable to impose the desired transformation the Linq Provider is given the chance to incorporate the abstract transformation when compiling the concrete transformation. When just using IEnumerable this chance is denied from the very beginning.
Downloads:
Source for Com.Hertkorn.Framework.FilterByExample - the Operator example
LinqToSql Example - using the Operator on Linq to SQL - If you want to run it make sure to adjust the connection string in the app.config
