Is it always guaranteed that a multi-threaded application runs faster than using a single thread?

Asked

Viewed 5,920 times

124

It is possible to observe in some cases that the separation of tasks into multiple threads does not give gain and even makes an application slower than the use in thread unique. It should not always have gained performance when it has multiple processors (not necessarily chips)?

Gain occurs in some situation when there is only one processor (or core) on the machine?

You can parallelize any task and gain performance?

  • 13

    The downvoter Could you tell me what the problem is with the question? It would help to learn what was seen wrong with it. Me and all the other members will learn how to behave in the community.

8 answers

124


An analogy can help.

You have a lot of letters that need to be delivered to various addresses in the city. Then you hire one motoboy to deliver your letters.

Consider that traffic signs (traffic lights, headlights) in your city are perfect. They’re always green unless someone’s at the intersection.

Add threads

The motoboy need to quickly deliver several letters. Since there is no one else on the streets, every light is green. But this could be faster. Better hire another pilot for the bike.

The problem is that you still only have one bike. So now your first motoboy drives the motorcycle for a while, and then from time to time he stops, abandons the bike, and the second contractor starts piloting it.

Does it get faster? No, of course not. This is slower. Adding more threads can’t do anything faster. Threads are not magic. If a processor is able to do one billion operations per second, adding another thread will not make the processor perform another billion operations per second. Instead, it steals resources from others threads. If a bike can run at 180 km/hour, stopping the bike and having another rider jump on it will not make the delivery faster! Clearly, on average, the letters are not being delivered more quickly in this scheme, they are just being delivered in a different order.

Add processor

Okay, so what happens if you hire two pilots and two bikes? Now you have two processors and a thread by processor, so that will be faster, right? No, because we forgot about the traffic lights. Before, there was only one motorcycle driving at high speed at any time. Now, there are two motoboys and two motorcycles, which means that now, sometimes, one of the motorcycles should wait, since the other is at the intersection. Once again, the addition of more threads slows down the pace as you spend more time disputing the intersections of the streets. The more processors you add, the worse it gets, you end up with more and more waiting time at the red light and less and less time delivering letters.

Adding more threads can cause negative scalability if more locking (lock) competing with each other. How much more threads, more dispute, and everything gets slower.

Of course you’ll have two delivery boys working, this can leave delivery faster if they don’t waste too much time standing at traffic lights.

Increase the clock

Suppose you arrange it more powerful motorcycles - now you have more processors, more threads and faster processors. Now this will always be faster? No. Almost always, no. Increasing processor speed can make programs multithreaded slow down. Again, think about total traffic.

Suppose you have a city with thousands of drivers and 64 motorcycles, all motoboys Some of them are at intersections blocking other motorcycles. Now you have all these bikes running faster. Does this help? Well, in real life, when you’re driving around on the normal streets, will you reach your destination twice as fast in a Porsche as in a Honda Civic? Of course not, most of the time, driving around town, you’re stuck in traffic.

If you can drive faster, often you end up waiting longer in traffic because you end up driving faster in congestion. If all people arrive faster to congestion, then congestion worsens.

Examples where it helps and where it doesn’t

Performance multithreaded can be deeply contradictory. If you want high performance it is recommended not to use a multiple solution threads, unless you have an application that is "intrinsically parallel" - that is, some application that is obviously capable of using multiple processors, such as, for example, calculating the mandelbrot set what makes ray tracing. And then, don’t put the problem anymore threads what processors are available. Therefore, for many applications, using more threads make the performance worse. Using a simpler example, adding up a list of any numbers can be parsed, but find successors Fibonacci can’t.

Another way to look at it: Nine women won’t make a baby in a month.

Administration cost

Suppose you have a task to perform. Let’s say that you are a math teacher and you have 20 papers to grade. It takes 2 minutes to correct each one, so it will take about 40 minutes.

Now let’s assume that you decide to hire some assistants to help you. It takes an hour (60 minutes) to locate 4 assistants. Each of them will correct 4 jobs and everything will be done in 8 minutes. You switched 40 minutes of work for 68 minutes in total, including overtime to find the assistants. This is not a gain. The burden of finding assistants is greater than the cost of doing the job yourself.

Now, suppose you have 20,000 jobs to fix, so it will take about 40,000 minutes. Now, if you spend an hour finding the same four assistants, that’s a win. Each takes 5000 jobs and is spent a total of 10060 minutes instead of 40,000 minutes, a savings of almost 5 times. The overhead of finding the assistants is basically irrelevant.

Parallelization is not free. The cost of dividing work between the different threads should be small compared to the amount of work done by thread.

The problem is not processing

If the tasks do not use the processor strongly then clearly there can be no speed increase, because as long as the processor is idle waiting for the disk or network to respond, it could be doing the work of another thread.

Let’s assume you have two tasks attached to the processor (which uses processing itself), a single processor, and a thread or two threads. Ignoring the time of administration, in the scenario of a thread we have the following:

  • Do 100% of task 1 work. Suppose this takes 1,000ms.
  • Do 100% of task work 2. Suppose this takes 1,000ms.

Total time: 2 seconds. Total tasks done: 2. But here is the important part: the customer who was waiting for task 1 has his job done in just 1 second. The client who was waiting for task 2 had to wait 2 seconds.

Now, if we have two threads and a CPU we see the following:

  • Do 10% of task 1 work, by 100ms.
  • Do 10% of task 2 work, by 100ms.
  • Doing 10% of task 1 work
  • Doing 10% of task work 2 ...

Once again, the total time of 2 seconds, but this time the customer who was waiting for task 1 has his job done in 1.9 seconds, almost 100% slower than the scenario of a thread!

Nor has it been considered the time to keep changing tasks that is not so small.

So if the following conditions exist:

  • tasks are limited by CPU capacity
  • there’s more threads that Cpus
  • The task is useful only by its final result and not its parts

So, add more threads only slows everything down.

But if any of the following conditions are not met, add more threads is a good idea:

  • If tasks are not limited by the CPU, add more threads allows the CPU to work when it would be idle, waiting for the network or disk, for example.
  • If there are idle Cpus, adding more threads allows these Cpus to be scheduled to work.
  • If partially computed results are useful, adding more threads improves the situation because there are more opportunities for customers to consume results already computed. In the second scenario, for example, customers of both tasks are getting partial results every 200 milliseconds, which is important.

Thread is not the only solution to make Cpus idle by external factors available for use. But this is another story.

The use of threads creates a risk of running condition.

The credit for this answer is essentially from Eric Lippert in the answers:

  • 14

    I’m experimenting with the community here. Is this the kind of legal content we should bring from the OS? If you disagree, please leave a comment or discuss the matter at http://meta.answall.com

  • 5

    That was one of the best analogies I’ve ever seen in my life. + 1

  • 1

    Very good compilation!

  • Really good the analogy.

  • 2

    I didn’t get that part: Agora você tem dois processadores e uma thread por processador, de modo que vai ser mais rápido, certo? Não, porque nós esquecemos sobre os semáforos. Would this be in case the two threads are synchronous? One depending on the outcome of the other? And if each of them can do her work independently of the other, you won’t have that problem?

  • 3

    Exact. And many cases require threads synchronous. Most problems require crossings. When you can guarantee an avenue without traffic lights (independent tasks) there is no problem to catch traffic lights. You may still have the other problems if you do not take the necessary care.

  • 1

    You didn’t miss talking about bound I/O tasks here, did you? They are perhaps the main reason behind single processor multithreading

  • @bigown, it became very clear the subject with these analogies, so we can bring into the real world these concepts. I also realized that there are many variables that can change the state of a program when we use Threads for both the positive and the negative side. I know very little about the subject, but I am looking to deepen. Congratulations on the explanation.

Show 3 more comments

41

Let’s go in pieces:

It should not always have gained performance when it has multiple processors (not necessarily chips)?

Not, as additional processing is required to create, finalise and manage the process exchange. If threads run for a short time, this additional "cost" negatively affects overall performance.

Gain occurs in some situation when there is only one processor (or core) on the machine?

Yes, because there are several factors that can cause a thread to be blocked. For example, in a network request, it is better for the requesting thread to be blocked until the network card receives a response, thus allowing other processing to be performed, even if there is no parallelism enabled.

You can parallelize any task and gain performance?

Not, certain algorithms are sequential in nature. A simple example that comes to mind is the calculation of Fibonacci number, the value of which depends on the calculation of previous paragraphs.

Even if different threads were created to calculate this number, each one would have to wait for the calculation of the two previously created threads to be completed. This is just one case to illustrate that there are situations where parallelism is not efficient. The idea is that some basic data processing types cannot be subdivided.

  • 2

    I’m making an answer and I was going to use the example of Fibonacci. I’m not even going to use it now :)

20

There are obstacles that can make an application multi-threaded less efficient than one with a single thread, such as shared memory (or not).

If two threads share the same memory region, concurrent access to it can cause problems. The solution - isolating portions of the program in critical sections - ends up having a high cost, if I’m not mistaken of the order of 10,000 processor cycles (equivalent to a disk access). If your algorithm has many snippets that require exclusive access to a [memory or other] resource, this lock may end up having a cost that nullifies the benefits of parallelization.

If on the other hand each thread controls a separate memory region, the overhead of communicating between the different threads also has to be taken into account. Speaking from experience, it is possible that the solution worst that sequential (even in Multi-processor environments, such as supercomputers) simply because the time that threads spend exchanging information with each other (for sockets, Pipes, etc.) exceeds the time reduction provided by parallelization.

Add to that the considerations given in the other answers (overhead of the threading system itself, blocking and dependency between one thread and another, etc.), and the answer is nay, there will not always be performance gain.

18

The parallelization of tasks will not always give performance gain.

In the simplest situation: a single processor (chip), with a single core (core), without hyper-threading technology. All the code runs in the same place, and even if the processor does some optimization of running a piece of code while waiting for the result of another piece. Even so, there will always be a piece of code that will depend on a decision (an 'if'). And there will also be extra processing (more code running) to control the status of each thread, which requires more processing.

Even if there is more than one core or more than one processor, there will always be the need to manage what each thread is doing, and depending on the code, there will be waiting for a decision, a user input, the result of any other processing...

The resource competition should also be considered: if all threads need to access the memory, the hard drive, the network card, it may (easily) occur that a thread has to wait for the resource to respond, since the processor speed is much higher than the speed these devices provide data for it to proceed.

The largest parallelization gains will occur in tasks that have intensive processing, independent of user operations, and independent of the outcome of other processing. For example, if you have 8 cores available on your device, creating 7 threads to compress 7 different small files will bring some gain. If the same files are large (and do not fit in memory), then the hard drive neck will impact the performance of your parallelism.

17

Not, there are several issues that interfere with performance being they:

  • In a program multithreads so that a thread in operation it is necessary to make a context change (what consumes processor time beyond memory).
  • In a program with a very large amount of threads it will take multiple context exchanges which can cause the processor to spend a slice of time exchanging contexts than performed operations.
  • Performance gain will always be limited to sequential run time (see Amdahl law)
  • It is not possible to parallelize any kind of application so there are operations that are essentially sequential to average certain values of a data set and check which are above/below this average, note that it is not possible to start checking which ones are above/below the average at which the last number of the data set is summed and the sum is divided. Soon there are significant sequential parts.

16

Be careful not to confuse parallelism with competition.

Parallelism relates to two tasks running at the same time. Example: two tasks running at the same time, one on each CPU.

Competition relates to two tasks that can run independently of each other. Example: two tasks running on only one CPU.

Not all problems can have solutions that use parallel or concurrent tasks, and the problems they have, not all have an improvement in performance. Some problems may have efficient solutions with competition but that lose performance if you try to add parallelism causing system resources dispute, and the reverse may also occur.

  • 1

    I agree. Two tasks can be parallel And competing with each other, but not necessarily.

  • 2

    @downvoter: would you care to comment?

12

Not. The parallelism can make your program run its role in less time, but there is no such guarantee.

By the way, it may even be the other way around. In addition to the multi-threaded coordination by the program not being a simple task (which can lead to even more code to be executed), there is still a overhead from the operating system to create/destroy threads.

11

According to the amdhal law, the maximum speed up of a task that runs on several Cpus, depends on the proportion of code that can be.

That is, if the algorithm is highly parallelizable, increasing Cpus and threads should have a higher gain, if the algorithm is not parallelizable (that is, if there is too much flow control/data) then the gain is null or there may even be a loss of performance.

inserir a descrição da imagem aqui

  • Good addition

Browser other questions tagged

You are not signed in. Login or sign up in order to post.