Why is multiplication faster than division?

Asked

Viewed 1,927 times

32

Bit brushing question, but I was reading an article about javascript in which says that division is slower than multiplications.

And for example, I would recommend changing the code below :

var resultado = 4/2;

why that would be faster.

var resultado = 4 * .5;

Does this happen in any language? And why is multiplying more performative than dividing?

Here is the link of the article (in English) I mentioned, in the section Math : http://jessefreeman.com/game-dev/intro-to-programming-for-games-with-javascript/

  • 2

    I did the test on the console and did not notice any difference

  • 2

    @Silvioandorinha I believe it is noticeable if the execution of the code needs to be repeated numerous times.

  • 2

    I created a performance test in Javascript: http://jsperf.com/multiply-vs-divide-fight from time to time the division is faster, so if there is even a difference in Javascript, it should be insignificant.

  • 5

    Now the irony: the split seems to be consistently faster than multiplication in Internet Explorer 10.

  • I’m using Chrome and in the jsperf test the division was faster... I don’t think this question is pertinent in js

  • In my opinion, a good answer needs to contemplate the following: 1) hardware, division (iterative) is inherently slower than multiplication (parallel); 2) even with division having more latency in hardware than multiplication, what is the impact of the flow of these operations; 3) Constant operations are optimized by modern compilers. Does the same occur in Javascript engines? If yes, a state-of-the-art discussion; 4) improve the evaluation tests: the ones listed here appear to be inconclusive, either by use of constants or by not (des)considering the administrative burden of the tests.

  • Do the bill by hand. Division is more work than multiplication, right? For the processor too :)

Show 2 more comments

5 answers

14


As far as I know, this applies to all languages, after all the operation takes place in the processor. And if the processor does not have the division function, it needs to be emulated using other operations. The following explanation:

In multiplication, for each position, you’re only multiplying by 0 or 1, which is really easy to do: you either get all the zeroes, or you get the original input back.
As a result, each partial bit of product can be calculated with a port AND.
Once you have all the partial products, just add them together. There are some optimizations you can do to speed up the logic, but anyway you can see that this process can be done very quickly.

The division, on the other hand, is as ugly in binary as it is in decimal.
Simpler algorithms are made the same way they were in school: The old algorithm of "palpitate and correct if necessary" , meanwhile, faster algorithms tend to use search tables (that occupy chip area, therefore spend more).


Wikipedia on division algorithms: Note that even fast algorithms are still interactive - while multiplication can be done at once, division requires repeated interactions.

Source

  • 1

    This does not make much sense to me, because any division is simply a multiplication by the inverse of the quotient. Still, the sources are helping me understand the subject, so +1.

  • 1

    The division algorithm has the same order as the multiplication one. When I can I will exemplify the algorithms.

  • 2

    @Exact Renan. But even thinking this way, will be two jobs against one: 1- inverter o número; 2- multiplicar. Against just multiplying. Many compilers nowadays are smart enough to reverse constants in Compile-time, that is, var /= 2 will turn var *= 0.5 more time less time on the build. But when dealing with non-constant variables, it’s worth the good sense of performance vs usability.

  • According to to Wikipedia, there is a way to reduce a division to a multiplication and a shift: to divide by D, instead of multiplying by 1/D choosing M and N very large such that M/N == 1/D. Then replace N for N' such that N' is a power of two (then split becomes only one shift) and the difference between M/N and M/N' be it less than the margin of error for floating point transactions (i.e. results are equal). I just don’t know if this is commonly applied in practice (and if it is, if it is in general or just for constants).

  • Obs.: Iterative (iterative) = where it is necessary to iterate, that is, loop. Interactive (Interactive) = where it is possible to interact, that is, there is action between the user and the computer.

13

Tests carried out

According to the results of jsperf presented by Renan, other tests I created by differentiating integers and floating point and the tests recently created by Marcelo the statement "multiplication is faster than division in Javascript" is not at all times true.

Results obtained

For example, I just ran Marcelo’s new test on IE and Chrome. See the resulting chart when I write:

Velocidade do teste nos diferentes navegadores

The item referring to IE 10.0 it was I who just ran it. The browser seems to abstract the calculations so that the result is uniform in all tests. It gets weird.

The results in Chrome showed that integer multiplication was exceptionally faster. However, multiplication by floating point was worse than simple division.

The other browsers showed no significant differences.

With the exception of Safari version 4, in most tests, the premise of the article cited in the question was proved to be false, that is, the operation 4 * .5 was not executed more quickly than 4 / 2.

Analyzing

As for the differences between browsers, it certainly depends on how the division is implemented in each engine. We would have to analyze the type used in the original language and the conversions performed internally in that language, to then analyze CPU consumption.

In lower-level languages, the division may be more expensive, but in a linear run we would probably see no difference, as mathematical processors can complete the operation in just one cycle. Maybe there’s a difference in processors with Hyper-threading, which accumulate more than one operation per cycle when there are "lighter" operations such as integer multiplication, but would not be able to do so with more "complex" operations. Note that I have no deep knowledge about processors to state with absolute certainty, moreover it depends on the CPU structure itself and can be true only in specific cases.

Completion

The statement that "multiplication is faster than division" has proved to be false in many cases. It was true for integer multiplication in some browsers, but not for floating point as a way to replace the division.

Unless you’re working on an embedded device with extremely limited memory and processing, don’t waste time on this kind of micro-optimization, especially in a language like Javascript that runs in different Engines, operating systems and processors.

  • 3

    Try performing again with ugly numbers, like 5-digit primes. (Just to see)

  • @Gustavomaciel Refiz using the number 65537 and there was no difference. In fact, in one of the executions at the FF, the floating-point division was slightly slower, but nothing very discrepant from the others, which suggests to me that it was a circumstantial disadvantage. The division using integers got the same result as the multiplications.

  • 2

    I asked only for scientific reasons hehe, besides, division by two whole would be optimized as a shift right, which could cause differences in results.

  • 3

    I reread the tests using a wider range of values, a few more things to avoid "confounding" (for reference, I suggest that Zed Shawn article [in English] about programming and statistics), and I had results... well, nothing consistent (between different browsers). But the whole multiplication actually showed a higher performance than the others, especially in Chrome. If possible, I would like more people to test in different environments, to aggregate more data.

  • @Gustavomaciel I agree, and I believe it refers to a divisor different from two, right? The existing tests only varied the dividend... : P

  • @mgibsonbr Very good elucidation to the question. I well became suspicious of those ties...

Show 1 more comment

7

There are several efficient algorithms to perform - in binary - multiplication (eg.: Dadda Multiplier, Wallace Tree) and the division (ex.: Newton-Raphson, Goldschmidt). Often they [at least multiply] are performed directly on the CPU, which brings quite quick results.

Source: that question on Yahoo! Answers (from 4 years ago)

As for the division, I find it difficult to find concrete and up-to-date data, but I can safely say that:

  1. It is possible to implement the hardware division - and this has been done for a long time;
  2. However, she is most expensive (i.e. requiring a more complex logic).

I say this based on Pentium bug, where an attempt at optimization of floating point division (FDIV) caused an error in several operations that - although unusual for day-to-day uses - significantly affected more specialized uses (such as astronomy, which routinely deals with very large numbers). The root cause of the error was the attempt to simplify a lookup table intended to accelerate the process of splitting - which otherwise would in fact be slower.

(Note: contrary to the Wikipedia article in Portuguese, the error was not in multiplication, but in division. For more details, see the article in English)

It’s been a long time since the Pentium, and the Moore’s Law made mathematical operations on hardware faster and cheaper. I mean, I can’t tell if one is faster than the other today without doing benchmarks - in different languages and circumstances. As for Javascript, this test in jsPerf (adapted from reply from Luiz Ricardo) showed inconclusive results - varying from browser to browser. However, multiplication between numbers whole seems to be a little faster than the other operations (or much faster, in Chrome): whole division, floating point multiplication and floating point division.

P.S. This test refers to division by numbers arbitrary. When trying to divide by a constant, it is common for the compiler (or interpreter, or JIT compiler) to replace the division with a simpler operation (shift, inverse multiplication, or something more sophisticated - see that comment). Thus, I do not believe that the two sample codes presented have significant difference of performance... but to say with certainty only checking via experimentation.

5

The following is a table with the clock latency of the multiplication and division operations for comparison.

Note that if your program actually does perform such operations, this is what will occur at the hardware level and there is no escape from this.

| Intel Core i7                     |
|-----------------------------------|
| Instruction | Operand   | Latency |
|-------------|-----------|---------|
| MUL/IMUL    | r8        | 3       |
| MUL/IMUL    | r16       | 5       |
| MUL/IMUL    | r32       | 5       |
| MUL/IMUL    | r64       | 3       |
| IMUL        | r16,r16   | 3       |
| IMUL        | r32,r32   | 3       |
| IMUL        | r64,r64   | 3       |
| IMUL        | r16,r16,i | 3       |
| IMUL        | r32,r32,i | 3       |
| IMUL        | r64,r64,i | 3       |
| MUL/IMUL    | m8        | 3       |
| MUL/IMUL    | m16       | 5       |
| MUL/IMUL    | m32       | 5       |
| MUL/IMUL    | m64       | 3       |
| IMUL        | r16,m16   | 3       |
| IMUL        | r32,m32   | 3       |
| IMUL        | r64,m64   | 3       |
| IMUL        | r16,m16,i |         |
| IMUL        | r32,m32,i |         |
| IMUL        | r64,m64,i |         |
| DIV         | r8        | 11-21   |
| DIV         | r16       | 17-22   |
| DIV         | r32       | 17-28   |
| DIV         | r64       | 28-90   |
| IDIV        | r8        | 10-22   |
| IDIV        | r16       | 18-23   |
| IDIV        | r32       | 17-28   |
| IDIV        | r64       | 37-100  |
| FMUL        | r         | 5       |
| FMUL        | m         |         |
| FDIV        | r         | 7-27    |
| FDIV        | m         | 7-27    |
| FIMUL       | m         | 5       |
| FIDIV       | m         | 7-27    |

| AMD Steamroller                     |
|-------------------------------------|
| Instruction | Operand     | Latency |
|-------------|-------------|---------|
| MUL/IMUL    | r8/m8       | 4       |
| MUL/IMUL    | r16/m16     | 4       |
| MUL/IMUL    | r32/m32     | 4       |
| MUL/IMUL    | r64/m64     | 6       |
| IMUL        | r16,r16/m16 | 4       |
| IMUL        | r32,r32/m32 | 4       |
| IMUL        | r64,r64/m64 | 6       |
| IMUL        | r16,(r16),i | 5       |
| IMUL        | r32,(r32),i | 4       |
| IMUL        | r64,(r64),i | 6       |
| IMUL        | r16,m16,i   |         |
| IMUL        | r32,m32,i   |         |
| IMUL        | r64,m64,i   |         |
| DIV         | r8/m8       | 17-22   |
| DIV         | r16/m16     | 15-25   |
| DIV         | r32/m32     | 13-39   |
| DIV         | r64/m64     | 13-70   |
| IDIV        | r8/m8       | 17-22   |
| IDIV        | r16/m16     | 14-25   |
| IDIV        | r32/m32     | 13-39   |
| IDIV        | r64/m64     | 13-70   |
| FMUL        | r/m         | 5       |
| FIMUL       | m           |         |
| FDIV        | r           | 9-37    |
| FDIV        | m           |         |
| FIDIV       | m           |         |

| VIA Nano L3050                    |
|-----------------------------------|
| Instruction | Operand   | Latency |
|-------------|-----------|---------|
| MUL/IMUL    | r8        | 2       |
| MUL/IMUL    | r16       | 3       |
| MUL/IMUL    | r32       | 3       |
| MUL/IMUL    | r64       | 8       |
| IMUL        | r16,r16   | 2       |
| IMUL        | r32,r32   | 2       |
| IMUL        | r64,r64   | 5       |
| IMUL        | r16,r16,i | 2       |
| IMUL        | r32,r32,i | 2       |
| IMUL        | r64,r64,i | 5       |
| DIV         | r8        | 22-24   |
| DIV         | r16       | 24-28   |
| DIV         | r32       | 22-30   |
| DIV         | r64       | 145-162 |
| IDIV        | r8        | 21-24   |
| IDIV        | r16       | 24-28   |
| IDIV        | r32       | 18-26   |
| IDIV        | r64       | 182-200 |
| FMUL        | r/m       | 4       |
| FDIV        | r/m       | 14-23   |

Source: Instruction Tables: Lists of Instruction latencies, throughputs and micro-Operation Breakdowns for Intel, AMD and VIA Cpus

This shows that in fact your processor will perform divisions slower than multiplications. This is due to the hardware implementation of these operations suffering from the same questions of algorithmic complexity already discussed in other answers. Regardless, your program in one way or another will make use of these instructions to perform such operations efficiently, and will be limited to the performance they offer.

As for the benchmarks and dubious tips that are supposedly based on this, beware. Each language will treat the expressions as it defines, the expression 1/2 in the source in C and C++ will end up becoming a constant 0 in the executable, and no division will be performed while running the program, in a dynamic language like javascript for example, the cost of interpreting the two operations may make the cost of the discussed arithmetic instructions irrelevant to the analysis. This is just an example, and the wise programmer will know when multiplication vs division makes sense or not, there are cases that actually makes, algorithms (I come to mind Bresenham now...) and languages where this makes a difference. When the programmer is unaware of this, and practices habits in a religious way, he falls into the case of micro optimization that is the root of all evil.

3

Good people, the reason the division is slower, not in the language, but in the processor. Since a multiplication operation the processor will perform a binary sum, as an example in the decimal system where 5 x 5 = 25 being 5 + 5 + 5 + 5 + 5 = 25. Already in the division we will have 2 different operations, which is multiplication and subtraction, In example 25 : 5, first you will perform a multiplication, then perform the subtraction. Summarizing the processor will have to do more processes involving binary arithmetic.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.