Why is array_shift considered a slow function?

Asked

Viewed 436 times

6

According to the Manual, array_shift removes the first element of the array.

I have seen many criticisms on the Internet about the performance of this function, because it reorder the whole index of the array at each removal, thus relocating the whole array.

I found this graph that represents the difference in performance between array_shift and array_pop(removes the last element of the array), which shows an absurd difference.

inserir a descrição da imagem aqui

I have seen amendments, such as the example below, to avoid the "performance problem" (because of reindexation of inflows) as follows:

function array_first(&$array)
{
   reset($array);

   $key = key($array);

   $value = $array[$key];

   unset($array[$key]);


    return $value;
}
  • Creating an alterative "manual" is really the best way to solve this problem?

  • What makes this function slow is simply the reindexation of the elements, or there are also other factors?

  • 1

    The function you put me in seems to be the behavior of array_pop, not of array_shift...

  • Corrected by @Oeslei

  • 1

    I consider that if you have an array with 1000 positions, the problem is in the implementation and below that the array_pop is performing similarly according to the chart. Still, if you don’t need the reordering of indexes, the function you set would be a better alternative to array_shift.

  • You already have reset, friend.

3 answers

2


The performance of this function is slow at certain times due to reindexation that is done to reorder the items of array, has to be taking into account that its complexity is O(n).

Assuming we have one array with 5 values and we wanted to remove the first element of array with the function array_shift, the process would be something like this:

array     >>  a b c d e f
copia     >> [a]
deleta    >> [a]
array     >>  " b c d e f
reindexa  >>  " b c d e f
                ↙ ↙ ↙ ↙
array     >>  b c d e f   

The internal functioning of array_shift is basically thus:

  • Get the first value, where arData is a array used to define the values of array incoming:

    >> idx = 0
    >> while(1)
    >>   p = Z_ARRVAL_P(stack)->arData + idx;
    >>   val = &p->val;
    
  • Copy this value to return as return:

    >> ZVAL_COPY(return_value, val);
    
  • After copying the value, it is deleted:

    >> if (Z_ARRVAL_P(stack) == &EG(symbol_table).ht) // <-- se for Hashtable
         zend_delete_global_variable(p->key)
    >> else
         zend_hash_del(Z_ARRVAL_P(stack), p->key)
    
  • Now it’s done to reindexration of the items:

    >> if (Z_ARRVAL_P(stack)->u.flags & HASH_FLAG_PACKED) { // <-- Se for array compactada
    >>   uint32_t k = 0;
    >>   for (idx = 0; idx < Z_ARRVAL_P(stack)->nNumUsed; idx++) 
    >>      p = Z_ARRVAL_P(stack)->arData + idx;
    >>      if (Z_TYPE(p->val) == IS_UNDEF) continue; // <-- Se não existir valor, continua
            if (idx != k) // <-- Se for diferente de 0, 0: item já não existente
                Bucket *q = Z_ARRVAL_P(stack)->arData + k;
                q->h = k;
                q->key = NULL;
                ZVAL_COPY_VALUE(&q->val, &p->val);
                ZVAL_UNDEF(&p->val);
            }
            k++;
        }
    
  • This reindexation process may not enter the above code block if the array did not have the flag HASH_FLAG_PACKED (indicates that it is a array compressed) and can enter the code block of the else on the line 2246.

Whether or not to use this function may depend on the size of the array to be analyzed, there are alternatives how to use the function array_pop together with array_reverse or reset with array_pop or unset($array[$indice]), this last maybe don’t be as slow as array_shift.

1

The function becomes slow precisely reindexation of the elements, in its graphic it is very clear that the greater the number of elements in the array the longer the execution time.

And in my view there is no problem in getting around this with some other alternative, like implementing your own function.


If you want to analyze follow the implementation code of the functions in PHP https://github.com/php/php-src/blob/fc33f52d8c25997dd0711de3e07d0dc260a18c11/ext/standard/array.c

On the line 2167 has the implementation of array_shift and on the line 2106 that of the array_pop, both are similar only noting the fact of the reindexation of the array elements in the array_shift function, as described in the line 2218

1

Creating an alterative "manual" is really the best way to solve this problem?

The function array_shift has a running time of 2n, that is, the larger the array, longer it will take. This is only a problem if you work with arrays very large.

According to the Benchmark, up to a size of 1000 the performance is virtually the same as the array_pop.

What makes this function slow is simply the reindexation of elements, or there are also other factors?

The function itself is not slow. The factor that causes slowness is the load of data it has to work with. Redoing the function with a generic implementation may have the opposite effect with arrays small, because the core functions are faster than user functions.

It will be very difficult to find a specific case that always the function will receive a giant load of data. If that is the case and the keys of the array are not relevant, use an alternative implementation only in that code snippet.

An alternative to the array_shift can be achieved with the combination of array_pop with array_reverse:

<?php

$array = array_reverse(getArrayGigante());

array_pop($array);

Browser other questions tagged

You are not signed in. Login or sign up in order to post.