Analyzing
We can make some initial considerations to better understand what happens with the presented code.
When you do sequence[start:stop:step]
, the part between square brackets will act as a syntactic sugar (Extended Indexing syntax) for the instantiation of an object of the type slice
. That is, when interpreting the code, Python (interpreter) will generate the object slice(start, stop, step)
and pass it to the sequence.
>>> sequence = [1, 2, 3, 4, 5]
>>> sequence[1:3]
[2, 3]
>>> sequence[slice(1, 3)]
[2, 3]
>>> sequence[1:3] == sequence[slice(1, 3)]
True
Another important consideration is that when we do sequence[key]
, for under the table, what will be executed is sequence.__getitem__(key)
, because the method called Dunder get item is responsible for overriding the access operator to an object key. The object being a list
, we can also say that the code will be equivalent to list.__getitem__(sequence, key)
, where the function is called __getitem__
directly from the class list
.
Why does it matter?
Knowing now that when it’s done sequence[start:stop]
, what is executed is list.__getitem__(sequence, slice(start, stop))
, we can then analyze how is the implementation of the function __getitem__
within the class list
. It’s important to point out here that the guy list
, in Python, it is implemented in C, so the code to be analyzed will be in this language, although it is easy to understand it even without knowing much such language.
The implementation of the object list
can be found on original Python repository (Cpython, in fact, which is the official interpreter of the language, implemented in C), in particular the function PyList_GetSlice
to be executed:
PyObject *
PyList_GetSlice(PyObject *a, Py_ssize_t ilow, Py_ssize_t ihigh)
{
if (!PyList_Check(a)) {
PyErr_BadInternalCall();
return NULL;
}
if (ilow < 0) {
ilow = 0;
}
else if (ilow > Py_SIZE(a)) {
ilow = Py_SIZE(a);
}
if (ihigh < ilow) {
ihigh = ilow;
}
else if (ihigh > Py_SIZE(a)) {
ihigh = Py_SIZE(a);
}
return list_slice((PyListObject *)a, ilow, ihigh);
}
Code excerpt taken from official repository, lines 477 to 497
In short, the function takes as argument the PyObject *a
, that will be our sequence
, and also the arguments Py_ssize_t ilow
and Py_ssize_t ihigh
which are respectively the start
and the stop
. In this function is made a validation of the values received and the call of list_slice((PyListObject *)a, ilow, ihigh)
.
static PyObject *
list_slice(PyListObject *a, Py_ssize_t ilow, Py_ssize_t ihigh)
{
PyListObject *np;
PyObject **src, **dest;
Py_ssize_t i, len;
len = ihigh - ilow;
np = (PyListObject *) list_new_prealloc(len);
if (np == NULL)
return NULL;
src = a->ob_item + ilow;
dest = np->ob_item;
for (i = 0; i < len; i++) {
PyObject *v = src[i];
Py_INCREF(v);
dest[i] = v;
}
Py_SET_SIZE(np, len);
return (PyObject *)np;
}
Code excerpt taken from official repository, lines 455 to 475
To highlight from this code the following:
- Knowing the values of
start
and stop
it is possible to determine the size of the list to be returned: len = ihigh - ilow
;
- A new list is defined,
PyListObject *np
, with the memory allocated, np = (PyListObject *) list_new_prealloc(len)
;
- A loop of repetition is made,
for (i = 0; i < len; i++)
, in which the value in the original sequence is sought, PyObject *v = src[i]
, and copied in the new sequence, dest[i] = v
;
- The instruction
Py_INCREF(v)
is responsible for increasing the number of references pointed to that object - objects with zero references are discarded from memory by Garbage Collector;
- The function will return the new created list,
return (PyObject *)np
;
Questions
Use the same list or copy the relevant data?
It will copy a reference to that value, as we can see highlighted in the code in C:
for (i = 0; i < len; i++) {
PyObject *v = src[i];
Py_INCREF(v);
dest[i] = v;
}
The value to be manipulated will be a pointer, PyObject *v
, then a reference to the value itself, but the value itself will remain the same. We can perceive this even when the type is mutable. If I have a list of lists, any changes we make to a sub-list will be reflected in any list created from a slice
.
>>> a = [[1], [2]]
>>> b = a[1:]
>>> b
[[2]]
>>> a[1].append(2) # Manipulando a lista original após o slice
>>> b
[[2, 2]]
If you have many items you will be slow?
It depends on how much the "many" is, but it’s hard to conclude that. It will depend even more on the configuration of your computer, operating system that is managing the memory, etc. It certainly has an impact, because Python will need to allocate memory to store this new object, but if it starts to make your application unfeasible, it is possible that you have to cogitate another language.
Happens even if I don’t keep in variable?
Yes. The execution of the entire commented process will take place regardless of whether the result has been assigned or not. The difference that when you do not assign, Python will realize that the returned object will have zero references pointed to it and the Garbage Collector will do the job of "clean up the mess", releasing the memories that were allocated and updating the reference counters of each list value.
If you copy, there’s some way you can do it without copying?
If you don’t need all values at the same time, you can optimize your solution using generators.
You can demonstrate this with some Python function or library?
Yes, see analysis of the C implementation studied in this answer.
This applies to other forms of slice
or just for the list?
For the guys who accept the slicing, I believe the behavior will be the same, including list, tuple and string. Obviously you set your own types and overwrite the operator __getitem__
and define the logic you wish for the slicing.
When you use Slice, you are making a copy of that list, the same behavior works using the
copy
python. To avoid this behavior you can setb = a
to use by reference.– Mauro Alexandre
Related: What good is memoryview in Python?
– Luiz Felipe
@Mauroalexandre but doing b = a not only takes a part right?
– Andressa Salles
@Luizfelipe pity that does not work with list
– Andressa Salles
@Andressasalles http://pythonsandbox.com/ access and take the test.
– Mauro Alexandre