Differences between declarative and imperative form of LINQ

Asked

Viewed 1,280 times

23

  • What one way can do that the other can’t?
  • There is a difference in performance?
  • There’s one advantage over the other?

Example:

using System;
using System.Collections.Generic;
using System.Linq;

public class Pessoa {
    public string Nome { get; set; }
    public DateTime? DataNascimento { get; set; }
    public int? Cpf { get; set; }

    public Pessoa(string nome, DateTime? dataNascimento = null, int? cpf = null){
        Nome = nome;
        DataNascimento = dataNascimento;
        Cpf = cpf;
    }
}

public class App {
    public static void Main(string[] args) {
        var pessoas = new List<Pessoa>() { 
                new Pessoa("João"), 
                new Pessoa("Maria"),
                new Pessoa("Jorge"),
                new Pessoa("Tiago") };

        // Seleciona pessoas que cujo o nome possue a letra 'a'
        // LINQ na forma declarativa
        var resultado1 = from pessoa in pessoas
               where pessoa.Nome.Contains('a')
               select pessoa;
        // LINQ usando sintaxe tradicional
        var resultado2 = pessoas.Where(x => x.Nome.Contains('a'));

        Console.WriteLine(resultado1.Count());
        Console.WriteLine(resultado2.Count());

        Console.ReadKey();
    }
}

1 answer

29


Performance

I’ll put another form of the program that measures performance:

using System;
using static System.Console;
using System.Collections.Generic;
using System.Linq;
using System.Diagnostics;

public class Pessoa {
    public string Nome { get; set; }
    public DateTime? DataNascimento { get; set; }
    public int? Cpf { get; set; }

    public Pessoa(string nome, DateTime? dataNascimento = null, int? cpf = null){
        Nome = nome;
        DataNascimento = dataNascimento;
        Cpf = cpf;
    }
}

public class App {
    public static void Main(string[] args) {
        var limiteDeItens = 1_000_000;
        var pessoas = new List<Pessoa>(limiteDeItens);
        var tempo = new Stopwatch();
        tempo.Start();
        //preenche a lista
        for(var i = 0; i < limiteDeItens; i++) {
            pessoas.Add(new Pessoa("Pessoa" + i.ToString()));
        }
        tempo.Stop();
        WriteLine($"Preencher a lista em ms: {tempo.ElapsedMilliseconds}");
        // LINQ na forma declarativa
        tempo.Restart();
        var resultado1 = from pessoa in pessoas
               where pessoa.Nome.Contains('9')
               select pessoa;
        tempo.Stop();
        WriteLine($"Montar o LINQ declarativo em Ticks: {tempo.ElapsedTicks}");
        // LINQ usando sintaxe tradicional
        tempo.Restart();
        var resultado2 = pessoas.Where(x => x.Nome.Contains('9'));
        tempo.Stop();
        WriteLine($"Montar o LINQ imperativo em Ticks: {tempo.ElapsedTicks}");

        // transferir todo resultado para uma lista apenas para efeito de comparação
        var lista1 = new List<Pessoa>(limiteDeItens);
        tempo.Restart();
        foreach(var pessoa in resultado1) {
            lista1.Add(pessoa);
        }
        tempo.Stop();
        WriteLine($"Transferir uma lista para outra com a primeira expressão em ms: {tempo.ElapsedMilliseconds}");
        var lista2 = new List<Pessoa>(limiteDeItens);
        tempo.Restart();
        foreach(var pessoa in resultado2) {
            lista2.Add(pessoa);
        }
        tempo.Stop();
        WriteLine($"Transferir uma lista para outra com a segunda expressão em ms: {tempo.ElapsedMilliseconds}");
    }
}

Behold working in the ideone. And in the .NET Fiddle (actually with so many items the time/memory limit is exceeded). Also put on the Github for future reference.

Run the program at least 2 or 3 times to warm up. See the results on your own.

Measuring the creation of expression

Note that resultado1 and resultado2, roughly speaking, they keep only the LINQ expression and not a list of the results, as many people might imagine. That is why the time measured is the time taken to assemble the expression. It is something so fast that it is better to measure in ticks of the operating system.

A declarative expression of LINQ costs a lot more (it is orders of magnitude) than the form closest to the imperative. But who cares?! It will run only once. It is so complicated to measure that even by measuring in ticks results change quite in each run.

Running the expressions

Then there is the execution of the two expressions that must produce the same result. Although I perform 1 million additions in the list the difference is small.

I made a code that shows millions of transfers of the result. It may not be the most efficient way but it is the one that clearly demonstrates the transfer taking place. You will always have the execution of other operations interfering with the result. This example is not purely measuring the execution of LINQ expressions.

Improving the measurement

There may even be some flaws in this measurement. There are certainly ways to analyze possible variations, for example, if the filter finds many or few elements that satisfy the conditions, but gives a general idea of the difference.

You could also make a comparison by filtering the elements by checking one by one through one for to see the difference. Most likely the use of for would be faster. But is it a difference that justifies its use? It may be that yes. It may be that other nuances exist.

You should also ask how much you need abstraction. In another question I answered I have already said that in many cases programmers do not understand these created abstractions, they do not understand LINQ well. So you have to analyze who’s going to tamper with this code before you think about the performance. Performance should only be a concern after you have measured and seen that it does not meet a need.

The conclusion is that the performance doesn’t matter so much unless you have an absurd amount of operations to perform. And even if it has such an absurd amount, will be that the problem is not in the chosen algorithm that requires so many operations?

How the LINQ works

Some people will look at this and will not understand why in the lines where the expressions are assembled the execution is so fast. How can 1 million elements be analyzed in some ticks?

Simple, they are not analyzed. A execution effective of the query will only occur when it is necessary. All of it, no matter how complex, will be executed element by element on demand. That is, only when you effectively need the result is that the query shall be executed. And shall not be executed until relevant.

There are darlings that do not need to analyze all elements. There are uses of these darlings that only a partial evaluation is required. The system works very well. It has efficient algorithms, which is the most important.

Lazy Evaluation causes an execution to occur only when it is invoked. So if you ask only one element that has the letter "a", and the first element has the letter "a", it will not need to analyze 1 million elements. Of course, this will occur if you use the correct method. In a Where the check will occur on all elements. You have to use the correct "question" to get the correct answer. If you only want the first, you will probably use the method Fisrt or FirstOrDefault.

The methods used in LINQ use continuations which manipulate the collections. Using the command yield each element is analyzed in each iteration.

See how it’s a simplified implementation of Where:

public static IEnumerable<T> Where(this IEnumerable<T> source, Func<T, bool> predicate) {
    foreach (T element in source) {
        if (predicate(element)) {
            yield return element;
        }
    }
}

There is difference between the two forms?

Visually it is clear that it exists. The declarative form was made to give a fluidity in the language. This form is usually called query syntax and the other way is usually called syntax method.

Declarative or query syntax

var resultado1 = from pessoa in pessoas
               where pessoa.Nome.Contains('a')
               select pessoa;

Imperative or syntax method

var resultado2 = pessoas.Where(x => x.Nome.Contains('a'));

Note that the first form could be written as follows:

var resultado1 = from pessoa in pessoas
               .Where(x => x.Nome.Contains('a')
               .Select(x => x);

Analyzing both, they perform the same thing. In the second case has a select implicit. Let’s see for parts.

Understanding an expression

The from is fundamental to establish that you are creating a LINQ declarative expression and it will always be declared the element to be analyzed in query that will come next.

What comes right after the in is the collection that the query will be applied. In the second form you use the collection to indicate where the operation will be performed. The element is implicitly passed to the methods that will follow.

The clause where in practice does not exist in language. This is a syntax sugar for the method Where. This method is an "extension method" for any type that implements a interface IEnumerable. Internally the method knows that it must take an element of the collection in question and send to a lambda that the method takes as a parameter (see the above example implementation).

The same goes for the select. Note that both the where as to the select are executed in sequence element by element as required by an element. The loop is not executed all in where and then another loop is executed in the select. At LINQ it’s like you have a single loop performing all operations declared in sequence on each element individually. You can understand that the from and in define the loop and all the rest of the statement are the body of this loop.

Lambda

Note that what goes as a parameter of the clause/method in any of the forms is a lambda. Even if in the first form it does not appear to be so. In this form the receiving of the parameter is implicit and the name that the element will have inside the lambda is the same as from. Seen this way it is easier to understand the similarity:

var resultado1 = from pessoa in pessoas
               .Where(pessoa => pessoa.Nome.Contains('a'))
               .Select(pessoa => pessoa);

In this case pessoa is a parameter, an internal/local variable of the method lambda. In declarative LINQ or query syntax there is much syntax sugar (syntactic sugar), that is, there is a lot of manipulation of the code found by the compiler that takes another form that integrates with the concrete form that the language will manipulate this.

If you consider that a lambda is also syntax sugar, you should already imagine that deep down is passing a delegate which will be executed within the method in question.

Lambdas are usually simple but nothing prevents a complex algorithm to be executed in each iteration of the collection element.

What is lambda

If anyone hasn’t been introduced to a lambda, see examples:

pessoa => pessoa.Nome.Contains('a')) //lambda que recebe um parâmetro chamado pessoa (o tipo é inferido)
() => Console.WriteLine("Hello World") //lambda sem parâmetro
(int x, int y) => x * y //recebendo dois parâmetros de tipos específicos e fazendo uma operação
delegate() { Console.WriteLine("Hello World"); } //usando a sintaxe de delegate

In the link above has more information.

Readability

There is (?!?!) a clear readability advantage in query syntax. And the above tests show a worsening of performance. Some tasks are easier or in some cases it is easier to identify when there is side effect (some state is modified in the process) in this way.

Of course in some simple cases this readability is not so great. Even by size the second form (syntax method) seems more readable. There are those who question this, but it seems cleaner, since it is shorter. Again, it goes from those who are reading.

Other differences

Not all methods that can be used in imperative form can be used in declarative form. Example: ToLookup. So there’s a major limitation in the declarative form.

One lambda Expression can be created dynamically in the application. You can even mount an expression tree in your hand via code.

There are some more flexibilities in an expression lambda that help in several more complex situations that a declarative expression would have difficulty.

I particularly see the declarative form as something more "beautiful", and that’s good. There’s no doubt about it. But apart from that, the most imperative form has more advantages. When you need these advantages, paradoxically your life gets easier, there is no choice.

Completion

No matter the shape, LINQ are codes representing data.

In the original form of the question the AP made some confusion with LINQ and lambda and many do not know the terms query Expression and lambda Expression. When I realized that’s what he wanted, I decided to save the question.

Understand that LINQ encompasses the two forms of expression and both use lambda although only one takes this in the name, after all the query Expressions use Amble only in disguise.

I put all the examples on Github.

  • Good, explanation...

  • In your code the benchmarks of "Transfer one list to another" do the same thing. Was that expected or were you wanting that in the second loop the list used was the resultado2?

  • @Francisco you’re right, it was a mistake, thank you.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.