Why invert the array index in a for, to popular the array, makes the process faster?

Asked

Viewed 536 times

11

I was doing some tests in Java and I noticed a very strange behavior. I made an array with two dimensions and populated the array in two different ways, filling with random numbers up to 2 31-1 and just changing its index. I’ve gained considerable performance by doing this, but I don’t understand why.

Here’s the code:

public class Array {

    static int tamanhoX = 12000;
    static int tamanhoY = 12000;
    static int array[][] = new int[tamanhoX][tamanhoY];
    static long tempoInicio = 0, tempoFim = 0;
    static double mediaTempo = 0.0;

    public static void main(String[] args) {
        //Armazena o tempo das operações
        long tempos[] = new long[10];

        //Calcula o tempo da operação diversas vezes:
        for (int i = 0; i < tempos.length; i++) {
            //Armazena o valor retornado dentro do array: tempos[] na posição: i
            tempos[i] = calculaTempo();
        }

        //Soma todos os valores armazenados no array: tempos[]
        for (int i = 0; i < tempos.length; i++) {
            mediaTempo += tempos[i];
        }

        System.out.println("Tempo total: " + (mediaTempo / 1000) + "s");
        mediaTempo = mediaTempo / tempos.length;//Calcula média
        System.out.println("Média de tempo é de: " + mediaTempo + "ms");
    }

    static long calculaTempo() {
        //Armazena o momento de início da operação em milisegundos
        tempoInicio = System.currentTimeMillis();
        for (int i = 0; i < tamanhoX; i++) {
            for (int j = 0; j < tamanhoY; j++) {
                //Preenche com um valor aleatório:
                //Mais lento
                //array[j][i] = (int) (Math.random() * Integer.MAX_VALUE);
                //Mais rápido
                array[i][j] = (int) (Math.random() * Integer.MAX_VALUE);
            }
        }
        //Armazena o tempo final da operação
        tempoFim = System.currentTimeMillis();
        System.out.println("Tempo: " + (tempoFim - tempoInicio) + "ms");
        return (tempoFim - tempoInicio);
    }
}

Note that I only changed a single line, the assignment of the array. In my tests I had an average of 90 seconds to:

array[j][i] = (int) (Math.random() * Integer.MAX_VALUE);

and 48 seconds to:

array[i][j] = (int) (Math.random() * Integer.MAX_VALUE);

I would very much like to understand the reason for this. Because that time difference in just reversing the array index?

Heed! Do not run this code on computers with little memory. There is a risk of crashing your operating system.

  • My question got a little weird, feel free to edit.

2 answers

13


In modern processors the bottleneck is not in the CPU, but in the cache. This means that the fastest program is not [necessarily] the one that performs the least instructions, but rather the one that accesses memory the most efficiently (i.e. the least Misses cache).

In Java, a multidimensional array is represented as an "array of arrays". This means that the first array is an array of references, and the rest are integer arrays. Both try to occupy contiguous memory positions:

int[][] array = new int[4][4];
// É equivalente a:
int[] a = { 0, 0, 0, 0 };
int[] b = { 0, 0, 0, 0 };
int[] c = { 0, 0, 0, 0 };
int[] d = { 0, 0, 0, 0 };
int[][] array = { a, b, c, d };

Let’s assume, for simplicity, that your L1 cache has size 4, and that there are only two "pages" available. In the beginning, array is in the cache. When your loop:

array[0][0] = ...;

He tries to carry array (great, it’s already in the cache! ) and then a (is not in the cache, search the next level and/or RAM memory). Then, it normally assigns.

If after that he does:

array[0][1] = ...;

He will find array in cache and a also in cache! The operation is quite fast. The same goes for array[0][2] and array[0][3]. Only when he tries to assign array[1][0] is that he won’t find b in cache - and will have to fetch.

But if instead he does:

array[1][0] = ...;

So b will not be in the cache, and it will have to fetch immediately. Then comes the array[2][0], and see: c also not in the cache, there it will search again. The moment it returns to the array[1][1] it will have loaded and downloaded the entire memory region of your program into the cache, and now it will have to do it all over again...

In practice, the difference is just not more glaring because the cache is not so small, and there are several levels (L1, L2...) - so that even if your data structure is not at L1 maybe it is at L2 (i.e. you don’t have to search from the beginning). But the bigger the data, the bigger the difference becomes - especially if the physical memory runs out and it starts paging (i.e. using virtual memory/swap).

  • 2

    Excellent explanation! Thank you very much... I researched a little more on the subject, I was fascinated with this technique. It’s amazing how hardware-level knowledge makes you develop good software.

3

As stated above, the processor has a small amount of cache memory, the TLB, that stores recently accessed memory positions, and this memory contained in the processor, has faster read/write speed than the RAM since the RAM depends on buses and etc. and the TLB is directly in the processor along with the MMU.

If we walk the loop for column by column (vertically) where the cache TLB stores linearly (horizontally) the cache is "wasted" because it has a relatively small size and its data is overwritten always with more recent memory addresses, but if we go through the loop linearly, the cache TLB shall be reused by reusing the last memory addresses consulted. This use is what generates the performance gain, since the processor does not need to consult in memory RAM the address table, saving several machine cycles.

I think this cache exists on all current processors, it generates a very large performance gain, is an indispensable item nowadays.

There’s a very interesting article on a related subject here, is worth a read.

Sources

  1. http://en.wikipedia.org/wiki/Translation_lookaside_buffer
  2. http://en.wikipedia.org/wiki/Memory_management_unit
  3. http://queue.acm.org/detail.cfm?id=1814327
  4. http://www.hardware.com.br/termos/tlb

Browser other questions tagged

You are not signed in. Login or sign up in order to post.