tl;dr:
Methods are temporary objects - they are actually created
each time they are accessed, with the aggregation of the attribute "self"
the function that is declared in the class. That is: a method
Python instance even exists in memory while
is not in use for any instance of a class.
"Calm down, that’s not what you’re thinking at all"
Yes, of course for you to look at the clipping you made, a logical hypothesis
is precisely that "each instance has its own instances
of methods" - and, if it were really that, your concern
is very well placed - each instance of a class
would occupy a very large memory space with almost
identical to the methods.
But not always the first chance we get, even if it seems
very simple, it’s a fact.
What happens is that method objects are not created
when the class is created, nor when an instance is created. Methods are
produced in "real time" when accessed - either in the context
of the class, whether in an instance, and do not consume memory if not in use.
Yeah, that might have some impact on performance,
nothing close to the impact on the memory that concerned you, and there are ways to mitigate that
loss of performance if it is found that it actually has some impact on your code, through Profiling. (And it also bothered me a lot when I found out, until I realized that the impact is really minimal)
First, let’s continue your finding that methods in different instances are different objects: in fact, the same method, in the same instance of a class, is a different object to be recovered as an attribute on different occasions:
In [38]: class A:
...: def b(self):
...: pass
...:
In [39]: a = A()
In [40]: a.b is a.b
Out[40]: False
Why is it no use comparing the id
of the methods?
I could compare the id
of the above methods, and Python print the same result - but this would be misleading. Why when calling id(a.b)
, the function id
returns the value, and the object a.b
which has been passed as a parameter is left without any reference, and is destroyed. A next call to id(a.b)
can bring bad luck (or luck), and create the new method in exactly the same memory address, and the comparison id(a.b) == id(a.b)
may result in True
, even the two a.b
being distinct objects. If I keep a reference to the first method object, however, the second will be created with a distinct id:
In [42]: c = a.b
In [43]: print(id(c), id(a.b), id(a.b))
140180715412680 140180937921736 140180937921736
Note just what I described: the first method object has a reference to more, in variable c - so it continues to exist after the call of id(c)
, but the second object is destroyed at the instant id
ends his execution, and the third
call to id
gets a a.b
in exactly the same position as the second call.
Mácomoéquepode??
Getting back to the main - what mechanism Python uses to create these method objects? This may be the coolest part of all: the mechanism used internally by the language is 100% exposed as a language feature, and is customizable in pure Python! That is: you can create your own decorators equivalent to @classmethod
and @staticmethod
that change the behavior of a method (this among other possibilities).
What language does is depend on the descriptors protocol (Descriptor Protocol): any attribute of the class (which includes functions defined in the class body), which implements a method from within __get__
, __set__
or __delete__
, when it is recovered (either with the notation of "Class.attr" or "instance.attr", or with "getattr(Class, 'attr')"), instead of being returned directly, it has its method __get__
called - what is __get__
return is used as the attribute value.
Typically, the descriptors protocol is more visible when using built-in @property
, which is already a shortcut to turn a method into a descriptor object that calls that method.
However, any function in Python 3 has the method __get__
, and what the method __get__
of a function does is just to transform it into an instance method! And a method is actually a very simple object: the method __get__
takes as parameter the instance where the attribute is being accessed - the method object stores this reference as an attribute, and, when called (in Python, any object that has the method __call__
can be called), it calls the original function, passing the instance as the first parameter. That’s where the argument self
is injected into the call of a method. (that is, the "self" that "seems magical", used as a parameter in all methods, is added by a well-documented language mechanism for "use and modification").
In [48]: class A:
...: def b(self):
...: pass
...:
In [49]: A().b
Out[49]: <bound method A.b of <__main__.A object at 0x7f7e9ee2dd30>>
In [50]: A.b
Out[50]: <function __main__.A.b(self)>
In [51]: print(dir(A().b))
['__call__', '__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__func__', '__ge__', '__get__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__self__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']
In [52]: A().b.__self__, A().b.__func__
Out[52]: (<__main__.A at 0x7f7e9ee076d8>, <function __main__.A.b(self)>)
In [53]: A().b.__func__ is A.b
Out[53]: True
That is, at the moment when Python executes the expression a.b
, the method __get__
of function A.b
is called, and takes as parameter a
. That one __get__
then creates the object method
, with the attributes __func__
and __self__
configured. When this method object is called, Python enters its method __call__
, and what he performs is equivalent to:
def __call__(self, *args, **kw):
return self.__func__(self.__self__, *args, **kw)
(The Python code of an object method
would be exactly that, and it’s just not why the object method
itself is defined in code C in Cpython)
@classmethod and @staticmethod
These two built-ins are implemented in native code, but now you can understand how they work: you can do the "classmethod" that makes a method receive the class instead of the instance in the first parameter creating an object that: keeps a reference to the original function, and has the attributes __get__
and __self__
appropriate - see how it looks in a few lines:
In [29]: class MyClassMethod:
...: def __init__(self, func, owner=None):
...: self.func = func
...: self.owner = owner
...:
...: def __get__(self, instance, owner):
...: # Cria um objeto novo a cada vez que
...: # o aributo é recuperado - evita problemas
...: # potênciais em programas multithreading
...: # com herança de classes:
...: return MyClassMethod(self.func, owner)
...: # Sem se preocupar com multithreading,
...: # esta função poderia fazer simplesmente:
...: self.owner = owner
...: return self
...:
...: def __call__(self, *args, **kw):
...: print("Método de classe chamado")
...: return self.func(self.owner, *args, **kw)
...:
...:
...: class A:
...: @MyClassMethod
...: def b(cls):
...: print(f"Estou na classe {cls!r}")
...:
...:
In [30]: A().b()
Método de classe chamado
Estou na classe <class '__main__.A'>
Memory cost of an instance
Going back a little to your initial concern: we have seen that each instance of an object does not create copies of the methods - what is in memory for each object then?
An instance creates in memory a generic Python object, which has a reference to its class (in Python, the class is an object like any other), in its attribute .__class__
, and creates a new dictionary in its attribute __dict__
and a structure to reference the "weakrefs" in __weakref__
. In addition, it has a reference to all attributes that are set in the __init__
.
An empty dictionary has about 250 bytes, the __weakref__
empty about 80 - and the "Pyobject" itself, about 60 bytes (Python 3.7 64bit - in 32bit these values may be smaller) - that is, a "new" instance of a common class, will use about 390 bytes.
In case it is an instance with well defined attributes, which will be instantiated many times (let’s say it is a class Point
, which will only store coordinates "x" and "y" and have methods to operate with them), it is possible to delete the creation of the internal dictionary of the instance (and the __weakref__
) - in this case, each instance will use only its 60 bytes plus the attribute space, without the 250 bytes of the __dict__
. To do this, just define the attribute __slots__
in the class body - Python creates a class with special layout, with direct space for the predefined attributes, and without the __dict__
:
In [51]: class BadPoint:
...: def __init__(self, x, y):
...: self.x = x
...: self.y = y
...:
In [52]: class Point:
...: __slots__ = "x", "y"
...: def __init__(self, x, y):
...: self.x = x
...: self.y = y
...: def distance(self, other):
...: return ((self.x - other.x) ** 2 + (self.y - other.y) ** 2) ** 0.5
...:
...: def __repr__(self):
...: return f"P<{self.x}, {self.y}>"
...:
In [53]:
In [53]: a = [Point(i, i) for i in range(1000)]
In [54]: get_size(a)
Out[54]: 65024
In [55]: b = [BadPoint(i, i) for i in range(1000)]
In [56]: get_size(b)
Out[56]: 205120
(get_size is a function that recursively calls the sys.getsizeof
of an object - I used the implementation that is in this recipe: https://goshippo.com/blog/measure-real-size-any-python-object/)
As you can see, the extra methods make no difference in size - there is only one copy of their "original" (like function objects) in the class. On the other hand, deleting the internal dictionary makes quite a difference on simple objects.
Possible optimizations
As written above, it is possible that this creation/destruction of method-type objects may impact some portion of an application - in general only if within another chunk of code we will call several times the same method of the same instance (that is, within a loop for
or while
).
And in such cases, all you need to do to avoid wasting resources is to keep a reference to the method that exists during the loop.
I mean, instead of:
for character in big_text:
myobject.transmogrify(character)
just write:
transmogrify = myobject.transmogrify
for character in big_text:
transmogrify(character)
Note that this same idea holds for any access to attribute, actually - since every time we write instancia.atributo
, the language has to check several things, including whether the attribute is a descriptor, before retrieving the attribute. The simple fact of placing the attribute in a local variable before the for
makes this mechanism be used only once instead of once in each repeat.
Only one addendum, in python we do not use getters and setters, in this case you should use properties (this in very specific cases still)
– Luis Eduardo
I’m on mobile and I can’t answer, but the idea is that the class itself has functions and when the class is instances Python creates a descriptor that defines the method. That is, the
getFoo
of the instance will be a descriptor that references the original function of the class. It is this descriptor that is responsible for defining the value ofself
which will be passed as the first parameter. That is, the function itself is not recreated, what happens is that each instance has its own descriptor, but all referring to the same function.– Woss