What is the difference between async, multithreading, parallelism and competition?

Asked

Viewed 2,015 times

20

What is the difference between async, multithreading, etc..?

They depend on the amount of processor cores?

If I do a program in Visual Basic and open 33 instances of it, would it be running in parallel? would it be 33 times faster? it would be better if a program executed once using async in C#?

I ask these questions because I am developing a program that would have the following order of execution:

Início
|
Parse de um XML web (alguns milisegundos pra executar)
|
Download de imagens da web (5 imagens, alguns segundos)
|
Gravação no banco de dados (alguns milisegundos)
|
Fim

And that cycle would repeat thousands of times a day.

I believe that in async would it run one of these instances per processor? If I ran this program (imagining it to be a console, or Windows Forms) it would get some productivity?

Note: I found a question which is a good way for my answer, but I think it’s still too generic for my problem, about running the same program over and over again.

3 answers

18

Processor

None of this depends on the computer cores directly.

True parallelism depends on having multiple processors (logical or physical). Without the computer having effective ability to process more than one thing at the same time it gives to obtain the feeling of parallelism without it being occurring simultaneously in fact.

When there is only one processing line the operating system can give a fraction of time for each thread (divide between all threads running in all applications. It switches from one to the other. When one is executed the other stops. As this change occurs at very small time intervals the perception is that all are running in parallel, even though it is not true.

If the computer can run 4 independent processing lines it can actually have 4 threads running at the same time and this exchange of a thread for another will occur in the same way, but on 4 different fronts, then increases the scale.

Open different instances

If you open 33 instances and only have a (logical) processor they will not be parallel in fact, but will run almost simultaneously. All begin more or less together and if they had the same load will end more or less together, but each one executed at its moment.

If you have 4 cores in total and everything is done correctly you should rotate about 8 in each core. Depending on what you are doing it should take just over a quarter of the time to perform.

CPU bound X IO bound

If the processing is almost exclusively dependent on the processor it only makes sense to create a thread per available processor. Creating more is likely to be wasting resources because there is a cost to keep changing from a thread to another. The more you put, the worse it gets.

If processing relies heavily on input and output devices and the processor spends a lot of time waiting for these devices to deliver what it wants, then even if it has a single processor there will be gain in having threads because in the time that a thread is waiting for input and output data is occurring. Have more details in question linked down below.

Create threads has always been a technique for solving IO bottlenecks (input/output). C# ended up creating tasks to better manage this. This way the programmer does not need to understand all the details to do it right and the library knows best whether to create thread or not, how many of them and how.

Asynchronicity and parallelism

This helps create asynchronous operations with await and async and parallel, mainly with the Task Parallel Library.

Asynchronous operations just require something to start running without blocking what is being done and when you have something to run it comes back from where it left off. This usually solves this input and output problem well. The asynchronicity does not require anything to run in parallel, which has thread none of that, if I have to use something like this, the Task will do for you.

Your case

You can do the parse of XML, in thesis more processing, simultaneously with the downloads of images, all in "parallel", and recording may also have some gain. Of course a task that depends on others will have to wait for these others to perform. We do not break the law of physics :)

Completion

I suggest you study the subject of async and try to apply it in the best way possible. It is a subject full of details, but that can give an incredible return. Of course I can’t talk about cases that I don’t know. There’s a lot of cases that you can imagine the gain, but only testing to be sure. Several times I surprised myself for good or bad when I saw the result. With the experience you "kick" better what will happen with the performance.

The question is too general, so the answer will be too. When you have more concrete problems, you are doing them, then you can ask more specific questions. see the various examples. If you want to learn it would be good to study with books on the subject of multiprocessing in general and on . NET.

Note that I have not mentioned competition that may or may not occur when there are several (pseudo) simultaneous processing lines.

Further reading:

  • ok, thanks for the various links I will still study and understand your entire answer, but remember that the download takes longer for network limitations than CPU, so in this case a parallelism to the extreme I think is worth, even having 4 cores a pc.

  • In this case the parallelism will make little or no difference. The asynchronicity is the important.

11


I wanted to enrich the discussion a little more and avoid confusion of theory:

Processes, Threads and OS

Process Scheduler, grossly speaking, is the mechanism organizes for how long which process will occupy the CPU and in which order, following a priority queue.

The Escalona operating system lawsuits, and in each process has one or more threads. What happens is that as the processor is scaling between processes, one of the threads of this process is necessarily executed. This makes it look like he manipulates threads, which is not true. Understand why this difference is important:

Imagine two processes running with different number of threads and with the same priority:

  • processo1 = 1000 threads (thread1, thread2, ..., thread1000)
  • process2 = 1 thread (thread1001)

If the processor is running process thread1, the probability to scale to process thread1001 2 in the next process change is 100%. As it scales processes, it will switch pro process2 and necessarily run thread1001. If it were to scale threads, threads would have priorities, it would have a 1/1000 or 0.001% chance of running thread1001 and this process would take a long time no matter how light.

The operating system works this way to prevent a thready process from taking control of the CPU and threadless processes never run.

PS: What I said above is not 100% true because there are kernel threads (kernel thread), these are threads that only the kernel can create and are dealt with at the level of processes. The threads we create by programming are user-level threads (user-level thread).

If you want to understand a little more about these processes and threads, I recommend reading: Sistemas Operacionais, Tanenbaum

  • 1

    Just yesterday I did a thread affinity implementation to choose the colors that will be used in parallel in my application, which would not be possible if it were as you said. Either the information does not proceed, or I missed understanding some detail of the terminology you used. PS: I understand that the "slice" of time is calculated by processes, obviously, and not by individual thread, but the system manages both, not only the processes alone.

  • Above I said that one of the threads is necessarily executed, but I didn’t go into detail. You can choose which thread of a process will occupy which CPU, but this choice will be set above the OS, whether in JVM, CLR or equivalent. In the OS view user-level threads do not exist, for it there are only processes and she has to select one to occupy the CPU. Then, which thread will run, once the process is selected, will depend on the JVM, for example, in conjunction with its settings.

  • 1

    It is actually set by the OS API itself, and it manages both process and thread Rodizio. This does not depend on any kind of language Runtime. Including I did in C, generating native . exe. One of the tests was to put affinity to a core in a thread, and to run others free, and in fact the affinity became more popular, as was to be expected. Those with affinity were left with the most damaged Slice.

  • 2

    Anyway, I just wanted to understand better, really, grateful for the feedback. The part that it’s no use to thread up that OS won’t give you any more Lycees just so it’s pretty clear.

  • Bacco, I don’t know where you learned the theory, but what happens is the exact opposite of what you’re saying. I recommend reading section 2.2.4 of the book I mentioned above: Operating Systems - Tanenbaum

  • 1

    I’m not talking about theory. If on a quad core machine I put affinity for core 1 in several threads, they will dispute for 1/4 of the features they would have no affinity, that’s kind of obvious. Normally they would be distributed. With affinity, they lose the real parallelism. But I only commented to better understand the post, I have no interest in extending too much. Even from the original edition of the book until today, things have changed a little, although the essence is the same.

  • I think it’s worth mentioning memory bottleneck and dual Channel.

Show 3 more comments

5

What is the difference between async, multithereading, parallelism and competition?

Async

It is the same as asynchronous, this word can be a bit confusing, but it means that the task will not be executed immediately, it will be scheduled and executed when there are resources and it is convenient.

Multithreading

It is the act of executing multiple Threads, which are fundamentally blocks of code. It is only possible to multithreading with parallelism.

Parallelism

It is the act of executing processes "at the same time".

The parallelism can be:

  • By division of time: A processor runs a process for a "quanta" of time and switches to another process, and this repeats for processes that are scheduled by the operating system. In this kind of parallelism there is an illusion that they are being executed at the same time. When not using all processor time and it is possible to parallelize tasks in this way.

  • With multiple processors (real parallelism): Each processor runs a process, all (the processors) at the same time, usually with shared memory, this can cause bottleneck in memory, if its use is intense. There are solutions like Multiple Channel where it is possible to access the memory also in parallel, decreasing the bottleneck. Each processor still uses time division.

Competition

It is when there is parallelism, the word competition is an analogy to "dispute over resources".

It is the role of the supervisor (operating system) to manage resources at your convenience.

Note: Virtual machines are called Hypervisors because they are supervisors of supervisors.

They depend on the amount of processor cores?

Certainly yes, operating systems in general are multi-tasking, support multi-core processors, and manage the use of these resources.

Note: Hyperthreading is a technology where the operating system "sees" more processors than actually exist, what is the explanation? These processors have a technology that allows tasks to be scheduled very efficiently, strictly speaking, the processors that the operating system "sees" are actually process schedulers. Applications that make intense use of multithreading with many schedules gain some performance with this (servers that serve many clients is an example).

If I do a program in Visual Basic and open 33 instances of it, it would be running in parallel?

Yes, the operating system does this job for you if it opens more than one instance, that’s how it is able to run more than one application at the same time.

... would be 33 times faster?

No. Because runtime is inversely proportional to the number of real processors.

This, considering that your program uses the processor intensely.

In fact, the 33 processes would be disputing the time of the real processors.

When using many threads there is no real gain as the system will spend more time scheduling tasks and performing resource access locks (mutex).

Even Hyperthreading doesn’t help this case.

... would be better than a program running once using async in C#?

Just using async does not mean anything, it is necessary to implement the 33 tasks in the same program, even if you do, it will not make a big difference, the advantage of doing this is sharing data in the same process. (without using sockets or IPC or shared memory, which causes greater complexity in development and debugging).

There is also the concept of cluster computacional, which consists of using several machines working in parallel each with its operating system. This is the type of parallelism more "fast", but it is the most expensive and difficult to maintain synchrony between processes.


Graphics cards are an example of architecture that uses well the parallelism, for example an average graphics processor (for current standards has clock order of 1GHz, is smaller than a CPU, but they have many cores.

Despite being a different architecture, it is possible to use the GPU as a processor, and take advantage of its computational power.

https://pt.wikipedia.org/wiki/OpenCL

https://pt.wikipedia.org/wiki/CUDA

It is worth remembering that it takes an analysis to know if its processing is parallelizable, that is, if there is dependency between intermediate results.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.