When is an Slice data copied?

Asked

Viewed 112 times

6

When I do so:

a = [ 1, 2, 3]
b = a[1:]

b use the same list of a or it creates another list and copies the data that is pertinent? If you have many items it will slow down?

  1. Happens even if I don’t keep in variable?
  2. If you copy, there’s some way you can do it without copying?
  3. You can demonstrate this with some Python function or library?
  4. This applies to other forms of slice or just for the list?
  • 1

    When you use Slice, you are making a copy of that list, the same behavior works using the copy python. To avoid this behavior you can set b = a to use by reference.

  • 1
  • @Mauroalexandre but doing b = a not only takes a part right?

  • @Luizfelipe pity that does not work with list

  • @Andressasalles http://pythonsandbox.com/ access and take the test.

2 answers

5


Analyzing

We can make some initial considerations to better understand what happens with the presented code.

When you do sequence[start:stop:step], the part between square brackets will act as a syntactic sugar (Extended Indexing syntax) for the instantiation of an object of the type slice. That is, when interpreting the code, Python (interpreter) will generate the object slice(start, stop, step) and pass it to the sequence.

>>> sequence = [1, 2, 3, 4, 5]
>>> sequence[1:3]
[2, 3]
>>> sequence[slice(1, 3)]
[2, 3]
>>> sequence[1:3] == sequence[slice(1, 3)]
True

Another important consideration is that when we do sequence[key], for under the table, what will be executed is sequence.__getitem__(key), because the method called Dunder get item is responsible for overriding the access operator to an object key. The object being a list, we can also say that the code will be equivalent to list.__getitem__(sequence, key), where the function is called __getitem__ directly from the class list.

Why does it matter?

Knowing now that when it’s done sequence[start:stop], what is executed is list.__getitem__(sequence, slice(start, stop)), we can then analyze how is the implementation of the function __getitem__ within the class list. It’s important to point out here that the guy list, in Python, it is implemented in C, so the code to be analyzed will be in this language, although it is easy to understand it even without knowing much such language.

The implementation of the object list can be found on original Python repository (Cpython, in fact, which is the official interpreter of the language, implemented in C), in particular the function PyList_GetSlice to be executed:

PyObject *
PyList_GetSlice(PyObject *a, Py_ssize_t ilow, Py_ssize_t ihigh)
{
    if (!PyList_Check(a)) {
        PyErr_BadInternalCall();
        return NULL;
    }
    if (ilow < 0) {
        ilow = 0;
    }
    else if (ilow > Py_SIZE(a)) {
        ilow = Py_SIZE(a);
    }
    if (ihigh < ilow) {
        ihigh = ilow;
    }
    else if (ihigh > Py_SIZE(a)) {
        ihigh = Py_SIZE(a);
    }
    return list_slice((PyListObject *)a, ilow, ihigh);
}

Code excerpt taken from official repository, lines 477 to 497

In short, the function takes as argument the PyObject *a, that will be our sequence, and also the arguments Py_ssize_t ilow and Py_ssize_t ihigh which are respectively the start and the stop. In this function is made a validation of the values received and the call of list_slice((PyListObject *)a, ilow, ihigh).

static PyObject *
list_slice(PyListObject *a, Py_ssize_t ilow, Py_ssize_t ihigh)
{
    PyListObject *np;
    PyObject **src, **dest;
    Py_ssize_t i, len;
    len = ihigh - ilow;
    np = (PyListObject *) list_new_prealloc(len);
    if (np == NULL)
        return NULL;

    src = a->ob_item + ilow;
    dest = np->ob_item;
    for (i = 0; i < len; i++) {
        PyObject *v = src[i];
        Py_INCREF(v);
        dest[i] = v;
    }
    Py_SET_SIZE(np, len);
    return (PyObject *)np;
}

Code excerpt taken from official repository, lines 455 to 475

To highlight from this code the following:

  1. Knowing the values of start and stop it is possible to determine the size of the list to be returned: len = ihigh - ilow;
  2. A new list is defined, PyListObject *np, with the memory allocated, np = (PyListObject *) list_new_prealloc(len);
  3. A loop of repetition is made, for (i = 0; i < len; i++), in which the value in the original sequence is sought, PyObject *v = src[i], and copied in the new sequence, dest[i] = v;
  4. The instruction Py_INCREF(v) is responsible for increasing the number of references pointed to that object - objects with zero references are discarded from memory by Garbage Collector;
  5. The function will return the new created list, return (PyObject *)np;

Questions

Use the same list or copy the relevant data?

It will copy a reference to that value, as we can see highlighted in the code in C:

for (i = 0; i < len; i++) {
    PyObject *v = src[i];
    Py_INCREF(v);
    dest[i] = v;
}

The value to be manipulated will be a pointer, PyObject *v, then a reference to the value itself, but the value itself will remain the same. We can perceive this even when the type is mutable. If I have a list of lists, any changes we make to a sub-list will be reflected in any list created from a slice.

>>> a = [[1], [2]]
>>> b = a[1:]
>>> b
[[2]]
>>> a[1].append(2)  # Manipulando a lista original após o slice
>>> b
[[2, 2]]

If you have many items you will be slow?

It depends on how much the "many" is, but it’s hard to conclude that. It will depend even more on the configuration of your computer, operating system that is managing the memory, etc. It certainly has an impact, because Python will need to allocate memory to store this new object, but if it starts to make your application unfeasible, it is possible that you have to cogitate another language.

Happens even if I don’t keep in variable?

Yes. The execution of the entire commented process will take place regardless of whether the result has been assigned or not. The difference that when you do not assign, Python will realize that the returned object will have zero references pointed to it and the Garbage Collector will do the job of "clean up the mess", releasing the memories that were allocated and updating the reference counters of each list value.

If you copy, there’s some way you can do it without copying?

If you don’t need all values at the same time, you can optimize your solution using generators.

You can demonstrate this with some Python function or library?

Yes, see analysis of the C implementation studied in this answer.

This applies to other forms of slice or just for the list?

For the guys who accept the slicing, I believe the behavior will be the same, including list, tuple and string. Obviously you set your own types and overwrite the operator __getitem__ and define the logic you wish for the slicing.

  • This behavior of creating a new list is in the language specification or is Cpython implementation detail?

  • @hkotsubo Actually only a new memory space is allocated on a pointer to store a copy of the pointers in the original list. It’s a distinct list, but the references are the same. Since this is a feature of C, I believe that in other implementations it may be different. Particularly never took to study them in depth, not even Pypy.

3

As we were told comments:

When you use Slice, you are making a copy of that list, the same behavior works using the copy() python.

Here a simple test where a slice(Slice) of a and its reference is assigned in b:

a = [ 1, 2, 3]
b = a[1:]

print("Antes da modificação a[1] = 'teste':")
print(*b)
a[1] = "teste"
print("\nApós modificação a[1] = 'teste':")
print(*b)
#Antes da modificação a[1] = 'teste':
#2 3

#Após modificação a[1] = 'teste':
#2 3

Still in the same comment:

To avoid this behavior you can define b = a to use by reference.

Here a simple test code, should not be used in production environment, where an object is created BasicListView descendant of collections.abc.Sequence which maintains a reference to a list and whose a change in the original list is reflected in the instance of BasicListView:

#É implementação limitada e código serve apenas como ilustração
from collections.abc import Sequence

class BasicListView(Sequence):
  def __init__(self, seq, start=0, stop=None, step=1):
    self._seq = seq                                    #Guarda a referência a sequencia seq.
    self._start = start                                #Inicio da fatia.
    self._stop = len(seq)-1 if stop is None else stop  #Final da fatia.
    self._step = step                                  #O salto entre os elementos.

  #Essa função é abstrata na classe base deve ser implementada.
  def __len__(self):
    return (self._stop - self._start) // self._step

  #Essa função deve ser sobrescrita pois é ela que permite o objeto ser indexado.
  def __getitem__(self, key):
    k = self._step * key + self._start    
    return self._seq[k]

    
a = [ 1, 2, 3]
b = BasicListView(a,1,3)

print("Antes da modificação a[1] = 'teste':")
print(*b)
a[1] = "teste"
print("\nApós modificação a[1] = 'teste':")
print(*b)
#Antes da modificação a[1] = 'teste':
#2 3

#Após modificação a[1] = 'teste':
#teste 3

The two examples in Ideone

Answering your questions:

Happens even if I don’t keep in variable?

For native sequences yes. Every time you slice a native sequence, a superficial copy of the original sequence will be made. The second example shows a non-native sequence whose default behavior has been changed.

If you copy, there’s some way you can do it without copying?

For native sequences yes copy and yes you have to do without copying, so work with references. The second example shows a non-native sequence whose default behavior has been changed.

You can demonstrate this with some Python function or library?

Yes, it was demonstrated in the two examples.

This goes for other forms of Lice or only for list?

This is valid for every native sequence. The second example shows a non-native sequence whose default behavior has been changed.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.