What is an unrolling?

Asked

Viewed 256 times

9

In this question I asked about optimization and performance that the compiler performs.

Among the highlighted items, users commented that the compiler makes an optimization called loop unwinding or unrolling.

  • What is this such of unrolling?
  • How it works in practice?
  • I need to do something specific in my code so that the compiler can use this optimization?

1 answer

9


This is an optimization that tries to give more speed to the code by eliminating or reducing the repetitions of a loop.

It is common for the optimizer to try to keep the generated final code about the same size, but this is not always possible, so in many cases the code gets a little bigger.

The ideal would be to completely eliminate the loop, do

for (int i = 0; i < 4; i++) {
    soma += dados[i];
}

in

soma = dados[0] + dados[1] + dados[2] + dados[3];

But when you don’t know the size the best you can do is:

for (int i = 0; i < n; i++) {
    soma += dados[i];
}

Be transformed into:

for (int i = 0; i < n; i += 4) {
    soma0 += dados[i + 0];
    soma1 += dados[i + 1];
    soma2 += dados[i + 2];
    soma3 += dados[i + 3];
}
soma = soma0 + soma1 + soma2 + soma3;

I put in the Github for future reference.

So it is possible to have some gain, not only because it reduces some loop control operations, but can decrease the call miss cache processor, in addition to decreasing the amount of branches (Conditional deviations) that cost expensive.

But note that in more modern processors there are so many own optimizations that the gain may not occur, in fact there are cases that can get worse, because at the same time there is reduction of some instructions, we need others, at least in the second example.

In the first case there may still be gain because it eliminates the loop altogether. Further, this may allow other optimizations to be made, such as linearize a function, although some compilers can linearize even without the unwinding. But linearizing a function can make the unrolling impracticable since the code may get too big to repeat it "manually". The compiler will have to analyze which is more interesting there.

If the size increases too much the miss cache code, so the compiler has to be pretty smart about the platform that’s generating code. You may also end up using more recorders forcing some maneuvers that were previously unnecessary.

In some rare cases doing this can facilitate the parallelization of operations, since you have not only one, but four. I find this more theoretical.

A Jitter may have advantage here because it has information that the normal compiler does not have, it may know the value of n and help decide whether or not to.

Four or five really are the most adopted values as advantageous unwinding, but this is implementation detail.

Do not try to do the optimization manually, it will lose readability and may end up with worse performance.

In the Wikipedia has more complete information. Of course more specific questions can be asked if it goes beyond basic curiosity.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.