Memory release on objects like dictionaries and lists is not automatic -
after a data structure grows - it does not decrease in size - but keeps the internal size so it can grow to the maximum size it had.
In other words: if I create a list, and add 100,000 elements, Python has a heuristic that already leaves the list with space for about 120,000 elements (for example).
Now - that’s the size of the list - the size of the objects that are within of the list, not counted in the size of it - and "getsizeof" deceives. (Did you see that it is not extensively documented? It is because it is more a debugging function than a use for final programs). That is, the getsizof
returns the size in bytes of an object, but not of objects referenced or "contained" in it. In the case of lists or dictionaries, each "inside" object occupies only 8 bytes (a pointer to the real object) - even if that other object has several megabytes.
So: you are measuring the wrong value. There is recipes on the internet to use the getsizeof
recursively and give you the full size of an object - but even if you do, it doesn’t solve your problem: if you remove large objects from within an "A" dictionary the full size of the "A" dictionary" going be smaller - because the other dictionary is no longer within "A" - but there may be some leakage if these objects have other references (in other data structures, such as caches, variables nonlocal
forgotten, etc.). So the way to monitor if there is no memory leak is to monitor the memory of the entire Python process with an external tool.
And you have to remember that Python repeats a little bit for the interpreter the recipe it uses for lists: if the process required 20MB - when you need less, it does not immediately return these 20MB to the OS, since there is a good chance that you will soon need these 20MB back. To test you have to monitor from the outside, create data structures of a few MB (few bytes or KB, as dictionary keys with strings do not help). View maximum size. Delete large objects. Create others objects of equal size - and then see if that previous maximum size holds (if by creating another generation of large objects, your process goes from 20MB to 40MB - sign that the first generation is still there somewhere)
This is difficult, because it will be an "integration" test - which is more a thing of the final execution environment itself, than something to worry about in the development.
In general in Python you really don’t have to worry about memory management - If you create giant dictionaries that enter as values within another dictionary, by deleting that key, the memory of the object is released immediately. (but it can be allocated to the Python process, to the next object of the same size).
Exercise
I created a code here that creates and descends large sub-dictionaries,
and takes its own memory in the operating system (I am in the
Linux and used pmap command) - see the program and output below:
import os, sys
import random
def get_process_memory():
"""Return total memory used by current PID, **including** memory in shared libraries
"""
raw = os.popen(f"pmap {os.getpid()}").read()
# The last line of pmap output gives the total memory like
# " total 40140K"
memory_mb = int(raw.split("\n")[-2].split()[-1].strip("K")) // 1024
return memory_mb
def create_mass(size=50):
return chr(random.randint(32, 127)) * size * 1024 * 1024
def main():
container = {}
print(f"memory at start: {get_process_memory()}MB")
container["mass1"] = {i: create_mass() for i in range(10)}
print(f"memory after creating first mass: {get_process_memory()}MB")
container["mass2"] = {i: create_mass() for i in range(10)}
print(f"memory after creating second mass: {get_process_memory()}MB")
del container["mass1"], container["mass2"]
print(f"memory after deleting first second masses: {get_process_memory()}MB")
container["mass3"] = {i: create_mass() for i in range(10)}
container["mass4"] = {i: create_mass() for i in range(10)}
print(f"memory after creating 3rd an 4th masses: {get_process_memory()}MB")
del container["mass3"]
print(f"memory after deleting 3rd mass: {get_process_memory()}MB")
del container["mass4"]
print(f"memory after deleting 4th mass: {get_process_memory()}MB")
container.clear()
print(f"memory after clearing container: {get_process_memory()}MB")
if __name__ == "__main__":
main()
Output in Ubuntu with 8GB, core i7, Python 3.8.0:
jsbueno ~/tmp01$ time python memory_exercize.py
memory at start: 45MB
memory after creating first mass: 545MB
memory after creating second mass: 1045MB
memory after deleting first second masses: 45MB
memory after creating 3rd an 4th masses: 1045MB
memory after deleting 3rd mass: 545MB
memory after deleting 4th mass: 45MB
memory after clearing container: 45MB
real 0m1,522s
user 0m0,582s
sys 0m0,823s
Above, I stated that Python does not always return the memory to the operating system, but can already "reserve" the memory for an eventual use.
In this example, where I set 50MB on each object, this is not visible: in all these cases, Python released the memory immediately.
For much more modest values, the effect can be observed. I reduced the allocation unit from 50MB to 256K (ie 200 times smaller) - and then we can see, between the third and fourth displacement that Python "holds" 3MB there with it:
memory at start: 45MB
memory after creating first mass: 47MB
memory after creating second mass: 50MB
memory after deleting first second masses: 45MB
memory after creating 3rd an 4th masses: 50MB
memory after deleting 3rd mass: 50MB
memory after deleting 4th mass: 45MB
memory after clearing container: 45MB
Maybe that answer (in English) be useful to you.
– jfaccioni