Best practices when working with Multithread

Asked

Viewed 932 times

10

Is the pseudo-code below following the good practices for an asynchronous code? It should:

  1. Synchronously start several tasks;
  2. After starting tasks, wait for all tasks to complete a post-processing;

Is this the best way to perform parallel processing on a large scale and then get your result? What does async/await mean in this case (in the main method) instead of starting the Task otherwise within the loop? I ask this because I got popped in the thread pool when I looped from Task.Factory.Startnew, and this burst stopped occurring when I changed the approach to async/await.

    public void Main(string[] args)
    {
        var tasks = new ConcurrentBag<Task>();

        for(int i = 0; i < 20; i++)
        {
            tasks.Add(Task1());
            tasks.Add(Task3());
        }

        Task.WaitAll(tasks.ToArray());
    }

    public async Task<int> Task1()
    {
        return await Task2();
    }

    public async Task<int> Task2()
    {
        return await Task.Factory.StartNew<int>(() => { return 3; });
    }

    public async Task<int> Task3()
    {
        return await Task.Factory.StartNew<int>(() => { return 3; });
    }
  • You are interested in a theoretical example or a real-world discussion. If the theoretical example, the answers are (1) No, this code starts synchronously several asynchronous tasks and (2) Yes. The last question seems more real-world, whose answer begins with a "depends on".

  • @Andrélfsbacci, you are correct about the synchronously, I updated the question because that is what I meant. I expect a real-world discussion with theoretical background. The application to be developed is to act in a real world

2 answers

6


var tasks = new ConcurrentBag<Task>();

tasksdoesn’t have to be a ConcurrentBag for it is not being accessed by multiple threads.

public async Task<int> Task2()
{
    return await Task.Factory.StartNew<int>(() => { return 3; });
}

In this method you are creating a task with StartNew and can remove the async the await no problem. Idem to Task3.

Apart from this your technique is actually good for performing several tasks in parallel. The idea is to actually add them to a list and wait for them all at once.

You can also use WaitAny to process as soon as a Task is over.

EDIT: Explanation about Startnew and async and await.

There are several aspects to consider when using the StartNew and decided to edit my answer.


A question you must answer before using the StartNew is:

The operation I want to perform is bound CPU or bound I/O?

If the answer is CPU bound, then the caller should start the operation on a task if so interested.

The intention of this approach is that the API does not always have to provide two interfaces, one synchronous and one asynchronous for a problem that encompasses processing.


If the answer is I/O bound the most certain is that there is already an API that provides tasks, if that’s the case you’ll never need to call StartNew.

The method StartNew just schedule an operation to be performed by threads to thread pool. This means that this same operation is relatively quick to execute.


This same method already returns a Task, which means it’s redundant to use async and await.

Basically when you write await you’re saying:

Run this method asynchronously. When the method is complete summarizes the computation in the calling thread.


I used Ildasm to inspect the code at the level of IL and how the compiler can generate code to reproduce the features of the keyword await. The code is as follows::

    public async Task<int> TaskD()
    {
        return await Task.Run(() => 3);
    }

    public Task<int> TaskE()
    {
        return Task.Run(() => 3);
    }

You can see the IL difference in the image:

inserir a descrição da imagem aqui

I also add that unless you need extra method parameters StartNew, should prefer Task.Run.

  • Is there a problem in this case not to use async/await?

1

This is a question that would probably require a book to answer, but try to narrow the scope.

async/await for the composition of methods

You have a system full of small features and excerpts that can be rotated with some level of parallelism. In that situation, instead of filling the code with Begin/End, best use async, await and Taskfor being cleaner and more economical. These commands/class provide an entire infrastructure for command chaining, and even the possibility of code or configuration level choreography.

The jump of the cat there is to remember that they do not solve problems of parallelism, only formalize them. An indirect advantage, as you yourself realized, is that this formalism is more economical than (re)doing all these features at hand, outside that tend to work better than shoving Task behind Task to process without caring about the system boundaries.

Thread and System.Collections.Concurrent for control

It may be that the system you are developing needs to operate multi-threaded, but the parallel part is relatively small and/or requires some kind of monitoring or control. Threading hundreds of threads/tasks for later execution (eventual?) can exhaust the system resources, as you yourself saw, outside that makes it difficult to measure progress without some form of coordination or central statistics.

In this situation it may be much more preferable to make expert classes that coordinate queues of items to be processed, and other classes, "processors", that consume data from these queues, feeding other queues or stacking results. Instead of async/await, fixed and hard code.

That one is not a more elegant development line than the first, and is usually a type of premature optimization. The processing of items remains indeterministic (due to parallelism), but the creation of instances of queues and processors passes to determinism, which may be necessary.

Crash and Burn, coordinated transaction or transaction without coordination

Finally, a depends on very important concerns the sense of "large scale" mentioned above.

If it’s a more mathematical type of processing, no side effects, which can simply be re-dispatched in the event of a failure, a system with async/await tends to be preferable, as the ease of composition is more important than performance. This is because parallelism tends to increase performance, but it is not magical: from a certain point it can get worse than sequential code, or as you noticed, nor work.

The same for coordinated transactional environments, when processing generates side effects, but where all Apis involved are transactional or co-ordinated (Transactionscope). In these cases it is preferable to first make the code that can run in parallel from the point of view of the transaction than to optimize each reading/writing, but eventually particular passages will be possible to parallelize, and hence async/await/Task will enter the scene normally. Concern is more with collisions/deadlocks arising from parallelism, not parallelism itself.

But there is the rare and sad situation, that you are doing something transactional, with side effects, but have no shared/coordinated transaction at all points. External commands, web services, automation... At these times the focus should be on making paranoid code, which deals well with flaws, partial executions, and reexecutions without duplication, or which at least can detect these situations at the negotiating level. Error handling permeates the code flow in such a way that it stays in sequential practice, and parallelism becomes a secondary concern.

A warning, maybe nothing to do

The advertisement Entity Framework is great, and even hating this API with all my strength, I use it in 100% of my projects involving database. It’s a good API of Unit of work.

What the advertisement fails to mention is that it is an API for Units small and Works non-competitive. EF with long transactions or with many objects is asking to know hell on earth. EF with competition, alone, edge imprestability.

This is a case of a shared resource (database data) where the most recommended API practically only offers the type solution crash and Burn, all or nothing.


This is a still generic answer. Unfortunately it is impossible to go into detail without knowing the case, the nature of the application.

Important to note that the above styles can be used at different points of the system. In my experience the type parallelism async/await is most common at the lower levels, when operating on I/O, and at the more abstract, business levels, these commands become much more scarce, parallelism being replaced by objects caching asymptotically generated data, and the use of events to notify progress/complete asynchronous processing.

  • Caso seja um tipo de processamento mais matemático ... um sistema com async/await tende a ser preferível, pois a facilidade de composição é mais importante que a performance. &#xA;Isso porque o paralelismo tende a aumentar a performance. Mathematical processing has a particularity, they tend to depend on the results of other operations/functions. Note that the keyword await does not demand any parelelism, this same keyword solves asynchronism only. The only way to get parelelism using tasks is to use Task.WaitAll or Task.WaitAny. Or variants with When.

  • Enfiar centenas de threads/tasks para execução posterior (eventual?) pode exaurir os recursos do sistema. Threads yes, tasks not so much. When tasks are used, threads from the threadpool are used that try to do the best possible work to try to keep the number of threads as low as possible and as high as the number of colors available. Very simplistically the taks can be seen only as a pointer to a function that needs to be executed.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.