First, array is different from list.
In many cases, usually more information, the nomenclature error will not be an aggravating and will not harm the communication, but since we will be analyzing the internal functioning of language I think it is interesting to stick to the correct terms. Given the questions, it became clear that the doubt is about the functioning of lists, not of arrays.
The guy list
is native to Python and has been implemented in C under the PyListObject
.
typedef struct {
PyVarObject ob_base;
/* Vector of pointers to list elements. list[0] is ob_item[0], etc. */
PyObject **ob_item;
/* ob_item contains space for 'allocated' elements. The number
* currently in use is ob_size.
* Invariants:
* 0 <= ob_size <= allocated
* len(list) == ob_size
* ob_item == NULL implies ob_size == allocated == 0
* list.sort() temporarily sets allocated to -1 to detect mutations.
*
* Items must normally not be NULL, except during construction when
* the list is not yet visible outside the function that builds it.
*/
Py_ssize_t allocated;
} PyListObject;
Where do we have:
ob_base
, an object that represents structures of variable size, implements the field ob_size
;
ob_item
, a pointer to pointer of PyObject
;
allocated
, an object Py_ssize_t
representing the allocated size;
As you can see - including commented in the code - the items that make up the list in Python will be managed from ob_item
. Because it is a pointer, it has the dynamism of being relocated according to the need of list growth, in which each position will point to another pointer PyObject
that represents the list item. It is worth remembering here that all the structures that Python implements inherit from PyObject
, then there will be no restrictions of the type that can be stored in the list.
Whenever the size of a list needs to be changed, the function list_resize
is invoked; it is responsible for implementing the reallocation algorithm of a list.
The source code for this function is:
/* Ensure ob_item has room for at least newsize elements, and set
* ob_size to newsize. If newsize > ob_size on entry, the content
* of the new slots at exit is undefined heap trash; it's the caller's
* responsibility to overwrite them with sane values.
* The number of allocated elements may grow, shrink, or stay the same.
* Failure is impossible if newsize <= self.allocated on entry, although
* that partly relies on an assumption that the system realloc() never
* fails when passed a number of bytes <= the number of bytes last
* allocated (the C standard doesn't guarantee this, but it's hard to
* imagine a realloc implementation where it wouldn't be true).
* Note that self->ob_item may change, and even if newsize is less
* than ob_size on entry.
*/
static int
list_resize(PyListObject *self, Py_ssize_t newsize)
{
PyObject **items;
size_t new_allocated;
Py_ssize_t allocated = self->allocated;
/* Bypass realloc() when a previous overallocation is large enough
to accommodate the newsize. If the newsize falls lower than half
the allocated size, then proceed with the realloc() to shrink the list.
*/
if (allocated >= newsize && newsize >= (allocated >> 1)) {
assert(self->ob_item != NULL || newsize == 0);
Py_SIZE(self) = newsize;
return 0;
}
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
*/
new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
/* check for integer overflow */
if (new_allocated > PY_SIZE_MAX - newsize) {
PyErr_NoMemory();
return -1;
} else {
new_allocated += newsize;
}
if (newsize == 0)
new_allocated = 0;
items = self->ob_item;
if (new_allocated <= (PY_SIZE_MAX / sizeof(PyObject *)))
PyMem_RESIZE(items, PyObject *, new_allocated);
else
items = NULL;
if (items == NULL) {
PyErr_NoMemory();
return -1;
}
self->ob_item = items;
Py_SIZE(self) = newsize;
self->allocated = new_allocated;
return 0;
}
Just as Maniero commented in his reply, basically Python allocates a new memory space already considering the new size of the list and considering spaces to spare and then copy the content to that memory space. Note that the space added to the size of the list will be proportional to the current size and this is done precisely to seek a linear amortization in the reallocation time. That is, it is easier to relocate a larger space once and insert multiple elements than to relocate each time you insert. Also note that all relocation is done on the object items
and not directly on self->ob_item
:
items = self->ob_item;
if (new_allocated <= (PY_SIZE_MAX / sizeof(PyObject *)))
PyMem_RESIZE(items, PyObject *, new_allocated);
else
items = NULL;
if (items == NULL) {
PyErr_NoMemory();
return -1;
}
self->ob_item = items;
PyMem_RESIZE
is a macro defined by:
#define PyMem_RESIZE(p, type, n) \
( (p) = ((size_t)(n) > PY_SSIZE_T_MAX / sizeof(type)) ? NULL : \
(type *) PyMem_REALLOC((p), (n) * sizeof(type)) )
Which basically does the memory relocation on the specified pointer.