An analogy can help.
You have a lot of letters that need to be delivered to various addresses in the city. Then you hire one motoboy to deliver your letters.
Consider that traffic signs (traffic lights, headlights) in your city are perfect. They’re always green unless someone’s at the intersection.
Add threads
The motoboy need to quickly deliver several letters. Since there is no one else on the streets, every light is green. But this could be faster. Better hire another pilot for the bike.
The problem is that you still only have one bike. So now your first motoboy drives the motorcycle for a while, and then from time to time he stops, abandons the bike, and the second contractor starts piloting it.
Does it get faster? No, of course not. This is slower. Adding more threads can’t do anything faster. Threads are not magic. If a processor is able to do one billion operations per second, adding another thread will not make the processor perform another billion operations per second. Instead, it steals resources from others threads. If a bike can run at 180 km/hour, stopping the bike and having another rider jump on it will not make the delivery faster! Clearly, on average, the letters are not being delivered more quickly in this scheme, they are just being delivered in a different order.
Add processor
Okay, so what happens if you hire two pilots and two bikes? Now you have two processors and a thread by processor, so that will be faster, right? No, because we forgot about the traffic lights. Before, there was only one motorcycle driving at high speed at any time. Now, there are two motoboys and two motorcycles, which means that now, sometimes, one of the motorcycles should wait, since the other is at the intersection. Once again, the addition of more threads slows down the pace as you spend more time disputing the intersections of the streets. The more processors you add, the worse it gets, you end up with more and more waiting time at the red light and less and less time delivering letters.
Adding more threads can cause negative scalability if more locking (lock) competing with each other. How much more threads, more dispute, and everything gets slower.
Of course you’ll have two delivery boys working, this can leave delivery faster if they don’t waste too much time standing at traffic lights.
Increase the clock
Suppose you arrange it more powerful motorcycles - now you have more processors, more threads and faster processors. Now this will always be faster? No. Almost always, no. Increasing processor speed can make programs multithreaded slow down. Again, think about total traffic.
Suppose you have a city with thousands of drivers and 64 motorcycles, all motoboys Some of them are at intersections blocking other motorcycles. Now you have all these bikes running faster. Does this help? Well, in real life, when you’re driving around on the normal streets, will you reach your destination twice as fast in a Porsche as in a Honda Civic? Of course not, most of the time, driving around town, you’re stuck in traffic.
If you can drive faster, often you end up waiting longer in traffic because you end up driving faster in congestion. If all people arrive faster to congestion, then congestion worsens.
Examples where it helps and where it doesn’t
Performance multithreaded can be deeply contradictory. If you want high performance it is recommended not to use a multiple solution threads, unless you have an application that is "intrinsically parallel" - that is, some application that is obviously capable of using multiple processors, such as, for example, calculating the mandelbrot set what makes ray tracing. And then, don’t put the problem anymore threads what processors are available. Therefore, for many applications, using more threads make the performance worse. Using a simpler example, adding up a list of any numbers can be parsed, but find successors Fibonacci can’t.
Another way to look at it: Nine women won’t make a baby in a month.
Administration cost
Suppose you have a task to perform. Let’s say that you are a math teacher and you have 20 papers to grade. It takes 2 minutes to correct each one, so it will take about 40 minutes.
Now let’s assume that you decide to hire some assistants to help you. It takes an hour (60 minutes) to locate 4 assistants. Each of them will correct 4 jobs and everything will be done in 8 minutes. You switched 40 minutes of work for 68 minutes in total, including overtime to find the assistants. This is not a gain. The burden of finding assistants is greater than the cost of doing the job yourself.
Now, suppose you have 20,000 jobs to fix, so it will take about 40,000 minutes. Now, if you spend an hour finding the same four assistants, that’s a win. Each takes 5000 jobs and is spent a total of 10060 minutes instead of 40,000 minutes, a savings of almost 5 times. The overhead of finding the assistants is basically irrelevant.
Parallelization is not free. The cost of dividing work between the different threads should be small compared to the amount of work done by thread.
The problem is not processing
If the tasks do not use the processor strongly then clearly there can be no speed increase, because as long as the processor is idle waiting for the disk or network to respond, it could be doing the work of another thread.
Let’s assume you have two tasks attached to the processor (which uses processing itself), a single processor, and a thread or two threads. Ignoring the time of administration, in the scenario of a thread we have the following:
- Do 100% of task 1 work. Suppose this takes 1,000ms.
- Do 100% of task work 2. Suppose this takes 1,000ms.
Total time: 2 seconds. Total tasks done: 2. But here is the important part: the customer who was waiting for task 1 has his job done in just 1 second. The client who was waiting for task 2 had to wait 2 seconds.
Now, if we have two threads and a CPU we see the following:
- Do 10% of task 1 work, by 100ms.
- Do 10% of task 2 work, by 100ms.
- Doing 10% of task 1 work
- Doing 10% of task work 2 ...
Once again, the total time of 2 seconds, but this time the customer who was waiting for task 1 has his job done in 1.9 seconds, almost 100% slower than the scenario of a thread!
Nor has it been considered the time to keep changing tasks that is not so small.
So if the following conditions exist:
- tasks are limited by CPU capacity
- there’s more threads that Cpus
- The task is useful only by its final result and not its parts
So, add more threads only slows everything down.
But if any of the following conditions are not met, add more threads is a good idea:
- If tasks are not limited by the CPU, add more threads allows the CPU to work when it would be idle, waiting for the network or disk, for example.
- If there are idle Cpus, adding more threads allows these Cpus to be scheduled to work.
- If partially computed results are useful, adding more threads improves the situation because there are more opportunities for customers to consume results already computed. In the second scenario, for example, customers of both tasks are getting partial results every 200 milliseconds, which is important.
Thread is not the only solution to make Cpus idle by external factors available for use. But this is another story.
The use of threads creates a risk of running condition.
The credit for this answer is essentially from Eric Lippert in the answers:
The downvoter Could you tell me what the problem is with the question? It would help to learn what was seen wrong with it. Me and all the other members will learn how to behave in the community.
– Maniero