Why do these two ways of initializing the same list in Python generate structures of different sizes?

Asked

Viewed 147 times

8

It is common to need to initialize a list in Python with a defined amount of elements and we can do this in two ways: 1) multiplying the list with an element by the desired amount; or 2) using the technique of list comprehensions.

Note: for mutable objects, use the comprehensilist on.

1. Initializing the list by multiplication

>>> import sys
>>> lista1 = [None]*15
>>> print(lista1)
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
>>> print(sys.getsizeof(lista1))
184

2. Initializing with comprehensilist on

>>> import sys
>>> lista2 = [None for _ in range(15)]
>>> print(lista2)
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
>>> print(sys.getsizeof(lista2))
192

It is interesting to note that the created objects have, in fact, the same value:

>>> print(lista1 == lista2)
True

So where, and why, did this 8 byte difference come from between objects? Both have exactly the same amount (15) of the same value (None). They shouldn’t be the same size?

1 answer

8


This is an expected behavior in Python regarding the reallocation of resources from a list.

Whenever memory relocation is done on a list, Python will allocate more memory than really necessary in order to avoid future relocations in the near future - it is simpler you reallocate n*X memory once and change the list n times than having to reallocate X memory for n times.

This is explicit in a comment on the function list_resize, in the official repository:

/* This over-allocates proportional to the list size, making room
 * for additional growth.  The over-allocation is mild, but is
 * enough to give linear-time amortized behavior over a long
 * sequence of appends() in the presence of a poorly-performing
 * system realloc().
 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
 */
new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);

That is, the memory size that will be relocated to the list will be proportional to its current size. This is evident when you modify a list within a repeat loop and check its current size:

import sys

lista = []

for _ in range(20):
    print('Lista de tamanho {:>3} com memória {:>3}'.format(len(lista), sys.getsizeof(lista)))
    lista.append(None)

The result will be:

Lista de tamanho   0 com memória  64
Lista de tamanho   1 com memória  96
Lista de tamanho   2 com memória  96
Lista de tamanho   3 com memória  96
Lista de tamanho   4 com memória  96
Lista de tamanho   5 com memória 128
Lista de tamanho   6 com memória 128
Lista de tamanho   7 com memória 128
Lista de tamanho   8 com memória 128
Lista de tamanho   9 com memória 192
Lista de tamanho  10 com memória 192
Lista de tamanho  11 com memória 192
Lista de tamanho  12 com memória 192
Lista de tamanho  13 com memória 192
Lista de tamanho  14 com memória 192
Lista de tamanho  15 com memória 192
Lista de tamanho  16 com memória 192
Lista de tamanho  17 com memória 264
Lista de tamanho  18 com memória 264
Lista de tamanho  19 com memória 264

Note: note that for this example I modified the list from the method append. That is, this behavior is expected for any reallocation of list resource, regardless of the source of this reallocation.

While it is possible to amortize the time of reallocation of resources on a list - which may be beneficial for the performance of the application - there are setbacks in the use of resources.

If you have a list with 16 values, None, and need to add another, your list will consume 264 bytes in memory instead of only 200, which would be necessary - but, remember, it’s Python. You will end up giving up some requirements to get others.

  • I didn’t even see what had been answered, lol...

  • @Magichat was practically together xD

  • 1

    I’ll see if I mention anything relevant that I didn’t mention there again...hehe

Browser other questions tagged

You are not signed in. Login or sign up in order to post.