What is the most performative way to convert int[] to int?

Question

What is the most performative way to convert int[] to int?

Asked 6 years, 11 months ago

Viewed 240 times

3

What is the most performatic way to convert the inverted positions of a array for int? For example, the following array should work in full 153 for the purpose of future calculation:

int[] a = new int[] {3, 5, 1};

I thought of two ways, but I couldn’t identify a good way to test which one performs better and I don’t know if any of them are really the best way. I’ll list them below.

Form 1:

private int arrayToInt(int[] numeros) {
  int resultado = 0;
  int multiplicador = 1;

  for (int numero : numeros) {
    resultado = resultado + (numero * multiplicador);
    multiplicador = multiplicador * 10;
  }

  return resultado;
}

Form 2:

private int arrayToInt(int[] numeros) {
  StringBuilder construtor = new StringBuilder();

  for (int numero : numeros) {
    construtor.append(numero);
  }

  return Integer.parseInt(construtor.reverse().toString());
}

1

In the same reverse order? First units and then tens?

– Jefferson Quesado

2017/11/27 at 05:22
1

@Jeffersonquesado exactly that!

– Sorack

2017/11/27 at 05:23
1

Because of int have a maximum of 9 digits, I can assume the maximum size of the array is 9?

– Jefferson Quesado

2017/11/27 at 05:26
@Jeffersonquesado yes, no need to cover all cases. In reality what really matters is the performance. I was able to think of some ways but I didn’t really think about how to test, so I decided to ask, in case someone knew which one was better and could explain why

– Sorack

2017/11/27 at 05:28
1

I believe that measuring the best is to know exactly how many calls from which bytecodes will be made, as well as some locality details and the call weight per bytecode. I’m coming up with an ugly way (with right to switch) but I believe it’s the fastest.

– Jefferson Quesado

2017/11/27 at 05:31
1

@Jeffersonquesado has two ways that I thought here too, I will edit the question and put the two there, only it remains the doubt of which is the best precisely because I could not figure out a way to test

– Sorack

2017/11/27 at 05:36
1

Working with strings will be slower than the first way, no doubt... Creation of a.length Distinct objects will be more expensive than any multiplication you make. I am including in my answer also a way of comparing performance based in this one

– Jefferson Quesado

2017/11/27 at 06:13
1

I’ll finish the answer another day, she’s not ready to be posted and it’s already later than I realized

– Jefferson Quesado

2017/11/27 at 06:40

Show 3 more comments

1 answer

Browser other questions tagged java performance

You are not signed in. Login or sign up in order to post.

by Jefferson Quesado • **22,370** points · Answer 1 · 2017-11-27T10:53:41+00:00

As I do commented, I believe that to get a general sense of how their sum is going to be performative or not, you would need to analyze the bytecodes that will be called and the general weight that they have relative to each other. It would also be appropriate to make the best possible use of the location of the variables, optimizing the use of cache processor.

I will divide this answer into two sections (in addition to this introductory):

an optimized function (see next paragraph on what is being taken into account)
how to test and counter-point various solutions

I will consider here that JNI calls are out of the question. I will also not consider the effect that the jitter may have. Just a theoretical analysis. I will also skip the compiler’s own optimizations.

Optimized sum function

The result obtained is a int, then we know that at most the number will be worth 2^31 - 1, which gives 10 decimal places. Therefore, the vector will have a maximum size of 10.

Since the size of the vector is limited (ranging from 1 to 10; not considering the trivial case of size 0), we can try to solve it in two ways:

iterative method (where I go through each element of the vector individially)
sum for each size

The advantage of the iterative method is that less code is written. I imagine writing the iterative method in two different ways: ascending and descending. Let’s take a look at them?

int soma_asc(int []a) {
  int base = 1;
  int acc = 0;

  for (int i = 0; i < a.length; i++) {
    acc += a[i] * base;
    base *= 10;
  }

  return acc;
}

I’m going to owe the bytecodes themselves, but on top the loop would be:

start the variables acc, base and i with his constants
iconst_0, istore_2, iconst_1, istore_1, iconst_0, istore_3
load the variable i in the pile
iload_3
rescue of vector size, leaving it on the stack
aload_0, arraylength
compares these figures, dropping out if false comparison
if_icmpge:13
access the vector value at the position i
aload_0, iload_3, iaload
carry base stack and multiply by the value previously obtained
iload_1, imul
load the old value acc, sum with the previously obtained value, store back in acc
iload_2, iadd, istore_2
carry base, load constant 10, multiply and store back in base
iload_1, ldc:#10, imult, istore_1
increase i
iinc:3:1
load the variable i in the pile
iload_3
rescue of vector size, leaving it on the stack
aload_0, arraylength
compares these values, going back to step 4 if the comparison is true
if_icmplt:5
carries the value of acc to be returned, returning shortly afterwards
iload_2, ireturn

Take as an example a vector of size 3. Reguintes bytecodes will be executed (lines started by ; are comments):

iconst_0, istore_2, iconst_1, istore_1, iconst_0, istore_3
iload_3
aload_0, arraylength
if_icmpge:13
aload_0, iload_3, iaload
iload_1, imul
iload_2, iadd, istore_2
iload_1, ldc:#10, imult, istore_1
iinc:3:1
iload_3
aload_0, arraylength
if_icmplt:5
; terminou de executar a tarefa referente ao primeiro dígito

aload_0, iload_3, iaload
iload_1, imul
iload_2, iadd, istore_2
iload_1, ldc:#10, imult, istore_1
iinc:3:1
iload_3
aload_0, arraylength
if_icmplt:5
; terminou de executar a tarefa referente ao segundo dígito

aload_0, iload_3, iaload
iload_1, imul
iload_2, iadd, istore_2
iload_1, ldc:#10, imult, istore_1
iinc:3:1
iload_3
aload_0, arraylength
if_icmplt:5
; terminou de executar a tarefa referente ao terceiro e último dígito

iload_2, ireturn

A total of 57 bytecodes were executed here, with two flow deviations. 37 accesses were made to read information from memory (3 accesses to heap), 9 write accesses.

If it were 4 digits, we would have +17 byte executions, +1 flow deviation, +11 memory read accesses (+1 access to heap), +2 write accesses.

Note that this function soma_asc can be rewritten with sugar-syntactic notation foreach, with the difference that a[i] would be stored in a variable v:

int soma_foreach(int []a) {
  int base = 1;
  int acc = 0;

  for (int v: a) {
    acc += v * base;
    base *= 10;
  }

  return acc;
}

This would imply the use of some bytecodes in a different way, if I’m not mistaken affecting only step 4:

load the value of a[i] in the variable v, and then load the value of v in the pile
aload_0, iload_3, iaload, istore#4, iload#4

This implies, for each set of instructions, more access to writing and reading. Therefore, without considering possible compiler optimizations, it would be two worse bytecodes for each number in the input vector.

The function soma_desc has another approach, going from the end point until the index reaches zero:

int soma_desc(int []a) {
  int acc = 0;

  for (int i = a.length - 1; i >= 0; i--) {
    acc = acc*10 + a[i];
  }

  return acc;
}

Above, in an analysis similar to what was done in the previous function:

incisize acc with constant 0
iconst_0, istore_1
load the size of a, subtract 1 and store in i
aload_0, arraylength, iconst_1, isub, istore_2
check whether to make the leap
iload_2, iconst_0, if_icmplt:6
rescue acc, multiply by 10, add a[i], keep in acc
iload_1, ldc:#10, imult, aload_0, iload_2, aiload, iadd, istore_1
increase i and check if it continues in the loop
iinc:1:-1, iconst_0, if_icmpge:4
load the value of acc and return
iload_1, ireturn

Again, if it is made for the vector with 3 elements:

iconst_0, istore_1
aload_0, arraylength, iconst_1, isub, istore_2
iload_2, iconst_0, if_icmplt:6
iload_1, ldc:#10, imult, aload_0, iload_2, aiload, iadd, istore_1
iinc:1:-1, iconst_0, if_icmpge:4
; terminou o primeiro laço

iload_1, ldc:#10, imult, aload_0, iload_2, aiload, iadd, istore_1
iinc:1:-1, iconst_0, if_icmpge:4
; terminou o segundo laço

iload_1, ldc:#10, imult, aload_0, iload_2, aiload, iadd, istore_1
iinc:1:-1, iconst_0, if_icmpge:4
; terminou o terceiro e último laço
iload_1, ireturn

+11 bytecodes per loop, 2 termination bytecodes and 10 start bytecodes, for a total of 45 executed bytecodes. 5 writing accesses in all (1 per loop), 4 reading accesses per loop (1 being from heap) plus 1 read on termination plus 1 read access from heap and 1 variable reading access at the beginning, totaling then 7 reading accesses (3 of heap). Also had 2 flow deviations.

Yeah, that algorithm seemed to be faster...

Now, as I know the maximum size of the vector used, I can make the sum suitable for size without involving any iteration:

int soma_desenrolada_1(int []a) {
  switch (a.length) {
  case 0: return 0;
  case 1: return a[0];
  case 2: return a[0] + a[1]*10;
  case 3: return a[0] + a[1]*10 + a[2]*100;
  case 4: return a[0] + a[1]*10 + a[2]*100 + a[3]*1_000;
  case 5: return a[0] + a[1]*10 + a[2]*100 + a[3]*1_000 + a[4]*10_000;
  case 6: return a[0] + a[1]*10 + a[2]*100 + a[3]*1_000 + a[4]*10_000 + a[5]*100_000;
  case 7: return a[0] + a[1]*10 + a[2]*100 + a[3]*1_000 + a[4]*10_000 + a[5]*100_000 + a[6]*1_000_000;
  case 8: return a[0] + a[1]*10 + a[2]*100 + a[3]*1_000 + a[4]*10_000 + a[5]*100_000 + a[6]*1_000_000 + a[7]*10_000_000;
  case 9: return a[0] + a[1]*10 + a[2]*100 + a[3]*1_000 + a[4]*10_000 + a[5]*100_000 + a[6]*1_000_000 + a[8]*100_000_000;
  case 10: return a[0] + a[1]*10 + a[2]*100 + a[3]*1_000 + a[4]*10_000 + a[5]*100_000 + a[6]*1_000_000 + a[8]*100_000_000 + a[9]*1_000_000_000;
  }
  return -1;
}

I don’t know how Java will compile the switch for bytecodes, but has how to predict the behavior after that. The sum it will accomplish. Let’s take for the case with vector size being 3 to compare:

; coloca a[0] no topo
aload_0, iconst_0, aiload
; coloca a[1] no topo, multiplica por 10 e soma
aload_0, iconst_1, aiload, ldc#10, imul, iadd
; coloca a[2] no topo, multiplica por 100 e soma
aload_0, iconst_2, aiload, ldc#100, imul, iadd
; resposta obtida, só retornar
ireturn

Simple as that. Running 16 bytecodes (compare with the others that for each increment in the size of the vector increased more than 10 bytecodes), 3 read accesses to the heap, no variable writing. Interesting, it is not?

We can try to decrease the amount of constants by multiplying the 10 in evidence. Thus, there will only be multiplications by 10, for no other value. This will require putting some more parentheses:

int soma_desenrolada_2(int []a) {
  switch (a.length) {
  case 0: return 0;
  case 1: return a[0];
  case 2: return a[0] + a[1]*10;
  case 3: return a[0] + 10*(a[1] + 10*(a[2]));
  case 4: return a[0] + 10*(a[1] + 10*(a[2] + 10*(a[3])));
  case 5: return a[0] + 10*(a[1] + 10*(a[2] + 10*(a[3] + 10*(a[4]))));
  case 6: return a[0] + 10*(a[1] + 10*(a[2] + 10*(a[3] + 10*(a[4] + 10*(a[5])))));
  case 7: return a[0] + 10*(a[1] + 10*(a[2] + 10*(a[3] + 10*(a[4] + 10*(a[5] + 10*(a[6]))))));
  case 8: return a[0] + 10*(a[1] + 10*(a[2] + 10*(a[3] + 10*(a[4] + 10*(a[5] + 10*(a[6] + 10*(a[7])))))));
  case 9: return a[0] + 10*(a[1] + 10*(a[2] + 10*(a[3] + 10*(a[4] + 10*(a[5] + 10*(a[6] + 10*(a[7] + 10*(a[8]))))))));
  case 10: return a[0] + 10*(a[1] + 10*(a[2] + 10*(a[3] + 10*(a[4] + 10*(a[5] + 10*(a[6] + 10*(a[7] + 10*(a[8] + 10*(a[9])))))))));
  }
  return -1;
}

How would it be for input with vector size 3?

; vou tentar obedecer a ordem dos parênteses, portanto, a primeira coisa é carregar a[0] na pilha
aload_0, iconst_0, aiload
; estado atual da pilha: a[0]
; a separação dos elementos da pilha vai ser por #
; agora, põe dez para multiplicar pelo resultado da soma...
ldc#10
; estado atual da pilha: a[0] # 10
; coloca a[1] na pilha
aload_0, iconst_1, aiload
; estado atual da pilha: a[0] # 10 # a[1]
; agora, põe dez para multiplicar pelo resultado da soma...
ldc#10
; estado atual da pilha: a[0] # 10 # a[1] # 10
; põe a[2] na pilha. depois desse passo, basta por os operadores
aload_0, iconst_2, aiload
; estado atual da pilha: a[0] # 10 # a[1] # 10 # a[2]
; primeira operação: multiplicação
imul
; estado atual da pilha: a[0] # 10 # a[1] # 10*a[2]
; soma...
iadd
; estado atual da pilha: a[0] # 10 # a[1] + 10*a[2]
; multiplica...
imul
; estado atual da pilha: a[0] # 10*a[1] + 100*a[2]
; soma
iadd
; estado atual da pilha: a[0] + 10*a[1] + 100*a[2]
; pronto, só falta retornar
ireturn

For a total of 16 bytecodes, as in the previous example. The difference here is that we have decreased the amount of constants in the code, so we may have some advantage of using the temporal locality of the memory fields.

Comparing performance of code

A few months ago I answered a question in Java about performance of sorting methods. For this, I elaborated a whole test scheme. I explained more of the methodology there. After running the tests, you get average and standard deviation of the run times.

Basically the problem here is to adapt the random number generation function in the function calculaTempo. I will post the original for reference, so that we can adapt to the world of up to 10 distinct integers [0,10), closed at the beginning and opened at the end:

public static LongSupplier calculaTempo(Consumer<Array> metodoOrdenacao) {
    return () -> {
        Array a = Array.getRandomArray(SIZE_ARRAY);

        long init = System.currentTimeMillis();
        metodoOrdenacao.accept(a);
        return System.currentTimeMillis() - init;
    };
}

To change to calculaSoma, I will mount the vector in hand myself. To do this, I will use a random number generator r which I will provide along with my arguments (yes, as IntSupplier). The function Random.nextInt(n) provides integer values in the range [0, n). If I use n=10, voi là, have the interval exactly as it had been speculated. I will call this variable fornece_0_10 to indicate that it is a IntSupplier which will return results at the closed interval at the beginning and open at the end [0,10). How not to calculate the sum of the vector in the trivial case where its size is 0, but rather in the size ranging from [1,11), the size of the vector will be 1 + fornece_0_10.getAsInt():

public static LongSupplier calculaTempo(IntSupplier fornece_0_10, Consumer<int[]> calculaSoma) {
    return () -> {
        int sizeOfA = 1 + fornece_0_10.getAsInt();
        int []a = new int[sizeOfA];

        for (int i = 0; i < sizeOfA; i++) {
            a[i] = fornece_0_10.getAsInt();
        }

        long init = System.currentTimeMillis();
        calculaSoma.accept(a);
        return System.currentTimeMillis() - init;
    };
}

To call this function, just start a random number generator r and roast the lambda function () -> r.nextInt(10) (considering all these functions as class statics Soma):

Random r = new Random();
IntSupplier fornece_0_10 = () -> r.nextInt(10);

Estatisticas estatisticaSomaAsc = repeatGetEstatisticas(N_REPETITION, calculaTempo(fornece_0_10, Soma::soma_asc));
Estatisticas.printEstatisticas(estatisticaSomaAsc, "SOMA ASCENDENTE");

Estatisticas estatisticaSomaForeach = repeatGetEstatisticas(N_REPETITION, calculaTempo(fornece_0_10, Soma::soma_foreach));
Estatisticas.printEstatisticas(estatisticaSomaForeach , "SOMA FOREACH");

Estatisticas estatisticaSomaDesc = repeatGetEstatisticas(N_REPETITION, calculaTempo(fornece_0_10, Soma::soma_desc));
Estatisticas.printEstatisticas(estatisticaSomaDesc , "SOMA DESC");

Estatisticas estatisticaSomaDesenrolada1 = repeatGetEstatisticas(N_REPETITION, calculaTempo(fornece_0_10, Soma::soma_desenrolada_1));
Estatisticas.printEstatisticas(estatisticaSomaDesenrolada1 , "SOMA DESENROLADA V1");

Estatisticas estatisticaSomaDesenrolada2 = repeatGetEstatisticas(N_REPETITION, calculaTempo(fornece_0_10, Soma::soma_desenrolada_2));
Estatisticas.printEstatisticas(estatisticaSomaDesenrolada2 , "SOMA DESENROLADA V2");

Now all that remains is to collect the results. The greater the number of repetitions defined in N_REPETITION, the most reliable results will be. How Random provides random numbers in uniform distribution, the values obtained are balanced when the number of repetitions tends to infinity.

Missing put the test result to see which the most efficient