This graph shows something very important when it comes to parallel algorithms: there is a limit on the performance gain that is obtained with parallel algorithms, that is, the increase in the number of processors used indiscriminately can mean a worsening of the Speedup (term used to measure the performance of parallel algorithms, if compared to the best sequential version or with the parallel algorithm itself running on a single processor).
Note that I am not quoting the term thread, since parallel algorithms can run in heterogeneous environments, that is, in different computers, including in different locations (see grid computing).
It is clear in this graph that with 8 processors you get the biggest Speedup. From then on, the Speedup falls with increasing number of processors. This type of behavior is natural in parallel algorithms.
Note that when a problem is parallelized, there is a need for communication between processes (or at least with the central node) to solve the problem. This communication can occur only at the end, to consolidate the result produced by each process or can occur at all times during the computation. Thus, the more processors, the more time it will take to communicate and this is one of the reasons why the Speedup fall with increasing processors.
Another important reason is granularity. One can divide the problem into many small parts which depending on the problem can make it inefficient. In addition, there may be many parts of sequential code and this may also harm the Speedup (Amdahl law).
Small analogy
In this sense, an interesting analogy is made when it comes to project management. Doubling the number of people in a project does not mean it will be delivered in half the time. On the contrary, it can mean a delay, because it is necessary to coordinate all these people together for the delivery of the project. Similarly, if there is a person in the project who accumulates many functions, then there will be delay, even if there are other people to do it (analogy of parallel algorithms that have a lot of sequential code).
Getting back to the subject...
This obviously varies from problem to problem. Problems of the type IO-bound (where it spends more IO than CPU) behave in a way. Already problems of type CPU-bound (where one spends a lot of CPU and little IO), they behave of another. One needs to understand its problem to know the limit in the number of processors.
In the scenario of the question, we have:
A - incorrect. Since the increase in the number of threads represented (from 8, exclusive) a drop in performance (Speedup). That is, there were threads sufficient to solve the problem.
C - I think I’m incorrect, but I’m still analyzing. Initial argument: The machine was harnessed to its full potential when 16 or more threads were used. Do not confuse with the advantage of the algorithm.
D - incorrect. Obviously using 4 threads you spend less resource (in this case colors) than 16.
That leaves the answer B.
Answer B says that implementations with 4 threads or less had a gain above 90% of the maximum theoretically possible.
I’m understanding gain as efficiency. Efficiency is calculated by the following formula: eficiencia = speedup/p
. Where p
is the number processors.
Thus, in the ideal case where (Speedup = p), one has an efficiency of 100% (the theoretical efficiency).
It is possible to check by graph that up to 4 threads efficiency was above 90%. Example of the clump for 4 threads: 3,8/4=95%
. 3,8
is the Speedup, found on the Y axis of the graph. 4 is the number of threads, found in the X-axis.
Note: 3,8
is an approximation, since the scale of the graph is 1 in 1.
It seems to me that the 4 are right.
– Maniero
No, D is wrong, I know... because you think the others are right?
– Gustavo Nunes
That’s not what the chart shows. I’m looking at the chart and it has all this information. I may be wrong about the purpose of this but you don’t even have to understand threads to answer this, just understand math.
– Maniero