Memory release in python dictionaries

Asked

Viewed 689 times

3

I’m trying to develop a program that temporarily stores a significant amount of variables. These variables are stored in RAM memory, inside dictionaries.

But I’m having a hard time understanding how python’s "garbage collector" works in dictionaries within dictionaries. I cannot release the memory of deleted children dictionaries. I used gc.collect() and .clear(), but it doesn’t solve. .clear() in the parent dictionary does the memory release, but this does not solve my problem, I need to do the memory release in the children dictionaries.

Using Debian 32-bit in the virtual box the dictionary values look like this.

#TESTE DE LIBERAÇÃO DE MEMORIA

from sys import getsizeof

d = dict()

for key in range(100):
    d["TESTE-"+str(key)] = {}

#resultado depois de adicionar dicionários
print(getsizeof(d))
#SAIDA -> 2612

for key in list(d):
    del d[key]

#resultado depois de deletar dicionários filho
print(getsizeof(d))
#SAIDA -> 2612

Indifferent to use d[key].clear() before del d[key], or gc.collect() doesn’t wipe the memory.

Is there any way to wipe the memory of children’s dictionaries?

Is this way of storing temporary data wrong? If yes, how should I work?

1 answer

2


Memory release on objects like dictionaries and lists is not automatic -
after a data structure grows - it does not decrease in size - but keeps the internal size so it can grow to the maximum size it had.

In other words: if I create a list, and add 100,000 elements, Python has a heuristic that already leaves the list with space for about 120,000 elements (for example).

Now - that’s the size of the list - the size of the objects that are within of the list, not counted in the size of it - and "getsizeof" deceives. (Did you see that it is not extensively documented? It is because it is more a debugging function than a use for final programs). That is, the getsizof returns the size in bytes of an object, but not of objects referenced or "contained" in it. In the case of lists or dictionaries, each "inside" object occupies only 8 bytes (a pointer to the real object) - even if that other object has several megabytes.

So: you are measuring the wrong value. There is recipes on the internet to use the getsizeof recursively and give you the full size of an object - but even if you do, it doesn’t solve your problem: if you remove large objects from within an "A" dictionary the full size of the "A" dictionary" going be smaller - because the other dictionary is no longer within "A" - but there may be some leakage if these objects have other references (in other data structures, such as caches, variables nonlocal forgotten, etc.). So the way to monitor if there is no memory leak is to monitor the memory of the entire Python process with an external tool.

And you have to remember that Python repeats a little bit for the interpreter the recipe it uses for lists: if the process required 20MB - when you need less, it does not immediately return these 20MB to the OS, since there is a good chance that you will soon need these 20MB back. To test you have to monitor from the outside, create data structures of a few MB (few bytes or KB, as dictionary keys with strings do not help). View maximum size. Delete large objects. Create others objects of equal size - and then see if that previous maximum size holds (if by creating another generation of large objects, your process goes from 20MB to 40MB - sign that the first generation is still there somewhere)

This is difficult, because it will be an "integration" test - which is more a thing of the final execution environment itself, than something to worry about in the development.

In general in Python you really don’t have to worry about memory management - If you create giant dictionaries that enter as values within another dictionary, by deleting that key, the memory of the object is released immediately. (but it can be allocated to the Python process, to the next object of the same size).

Exercise

I created a code here that creates and descends large sub-dictionaries, and takes its own memory in the operating system (I am in the Linux and used pmap command) - see the program and output below:

import os, sys
import random

def get_process_memory():
    """Return total memory used by current PID, **including** memory in shared libraries
    """
    raw = os.popen(f"pmap {os.getpid()}").read()
    # The last line of pmap output gives the total memory like
    # " total            40140K"
    memory_mb = int(raw.split("\n")[-2].split()[-1].strip("K")) // 1024
    return memory_mb

def create_mass(size=50):
    return chr(random.randint(32, 127)) * size * 1024 * 1024

def main():
    container = {}
    print(f"memory at start: {get_process_memory()}MB")
    container["mass1"] = {i: create_mass() for i in range(10)}
    print(f"memory after creating first mass: {get_process_memory()}MB")
    container["mass2"] = {i: create_mass() for i in range(10)}
    print(f"memory after creating second mass: {get_process_memory()}MB")
    del container["mass1"], container["mass2"]
    print(f"memory after deleting first second masses: {get_process_memory()}MB")
    container["mass3"] = {i: create_mass() for i in range(10)}
    container["mass4"] = {i: create_mass() for i in range(10)}
    print(f"memory after creating 3rd an 4th masses: {get_process_memory()}MB")
    del container["mass3"]
    print(f"memory after deleting 3rd mass: {get_process_memory()}MB")
    del container["mass4"]
    print(f"memory after deleting 4th mass: {get_process_memory()}MB")
    container.clear()
    print(f"memory after clearing container: {get_process_memory()}MB")

if __name__ == "__main__":
    main()

Output in Ubuntu with 8GB, core i7, Python 3.8.0:

jsbueno ~/tmp01$ time python memory_exercize.py 
memory at start: 45MB
memory after creating first mass: 545MB
memory after creating second mass: 1045MB
memory after deleting first second masses: 45MB
memory after creating 3rd an 4th masses: 1045MB
memory after deleting 3rd mass: 545MB
memory after deleting 4th mass: 45MB
memory after clearing container: 45MB

real    0m1,522s
user    0m0,582s
sys     0m0,823s

Above, I stated that Python does not always return the memory to the operating system, but can already "reserve" the memory for an eventual use. In this example, where I set 50MB on each object, this is not visible: in all these cases, Python released the memory immediately.

For much more modest values, the effect can be observed. I reduced the allocation unit from 50MB to 256K (ie 200 times smaller) - and then we can see, between the third and fourth displacement that Python "holds" 3MB there with it:

memory at start: 45MB
memory after creating first mass: 47MB
memory after creating second mass: 50MB
memory after deleting first second masses: 45MB
memory after creating 3rd an 4th masses: 50MB
memory after deleting 3rd mass: 50MB
memory after deleting 4th mass: 45MB
memory after clearing container: 45MB

Browser other questions tagged

You are not signed in. Login or sign up in order to post.