What is the function of Python descriptors?

Asked

Viewed 353 times

4

In Python there is the protocol of the descriptors which is basically to define a class to control the access to the attributes of another, but my question is, would this be the real functionality of a descriptor, only remove from the class the work of defining the logic of access to its attributes?

Here follows a script:

class Descriptor(object):

    def __init__(self, name = None):
        self.name = name

    def __get__(self, instance = None, owner = None):
        if instance is None:
            return self
        return instance.__dict__[self.name]

    def __set__(self, instance, value):
        if len(value) < 3:
            raise ValueError
        instance.__dict__[self.name] = value

    def __delete__(self, instance):
        del instance.__dict__[self.name]


class Person(object):

    name = Descriptor('name')

1 answer

6


The following response was based on the article Descriptor Howto Guide, written by Raymond Hettinger¹, in the official Python documentation.

Definition and introduction

In general, a descriptor is an object attribute with binding behavior whose access to fields is overridden by the descriptor’s own methods. These methods are __get__(), __set__() and __delete__(). If any of these methods is defined by the object, this can be called a descriptor.

The default Python behavior to return a field (get), assign to a field (set) and exclude a field (delete) is to return it, assign it, or delete it from a dictionary on the object. For example, a.x will first seek the value of x in a with a.__dict__['x'], after will search in the type of a with type(a).__dict__['x'] and so on, through the basic classes of a, until you get into object, excluding metaclasses. If the value found for the field x in this process is an object that implements the description protocol (being a descriptor), the default behavior will be changed to the invocation of the method in the descriptor.

Descriptor is a powerful protocol of general use. It is the mechanism behind the properties, methods, static methods and super(), used by the language since version 2.2 to define the classes in the new style.

Descriptor protocol

As mentioned above, the descriptor is composed of three methods, which have the following structures::

  • descr.__get__(self, obj, type=None) -> value
  • descr.__set__(self, obj, value) -> None
  • descr.__delete__(self, obj) --> None

And that’s all there is to it. An object that defines any of these methods will be considered a descriptor and may be used to override the standard Python behavior described above.

If an object defines both methods __get__() and __set__() it will be considered a data descriptor (data Descriptors). In turn, descriptors that define only __get__() shall be called descriptors of nondata² (non-data Descriptors) - the latter generally used for methods, but there are other possibilities.

The difference between a data descriptor and a nondata is how the superscript will be considered with respect to an instance dictionary entry. If the instance dictionary has an entry with the same name as a data descriptor, the data descriptor will take precedence. However, if the dictionary has an entry with the same name as a nondata, the dictionary will take precedence.

To define a read-only descriptor, simply define both methods __get__() and __set__() raising an exception AttributeError in __set__().

Invoking a descriptor

A descriptor can be called directly through the method. For example, d.__get__(obj), where d is the descriptor and obj the object described.

Alternatively, it is much more common for a descriptor to be called through access to fields. For example, obj.d will seek for d in the dictionary of obj; if it is a descriptor, defining the method __get__(), then Python will call d.__get__(obj), considering the rules of precedence, obviously.

Invocation details depend on whether obj is an instance or a class.

For instances, the logic is found in object.__getattribute__(), that turns the call b.x in type(b).__dict__['x'].__get__(b, type(b)). The implementation works on the chain of precedence that defines that data descriptors have precedence over instance variables, but these have precedence over descriptors of nondata. For more details, see the source code for PyObject_GenericGetAttr() in Objects/object.c.

For classes, logic is found in type.__getattribute__(), that turns the call B.x in B.__dict__['x'].__get__(None, B). In pure Python, it resembles:

def __getattribute__(self, key):
    "Emulate type_getattro() in Objects/typeobject.c"
    v = object.__getattribute__(self, key)
    if hasattr(v, '__get__'):
        return v.__get__(None, self)
    return v

The important points to remember are:

  • Descriptors are called through the method __getattribute__();
  • Overwrite the method __getattribute__() prevents automatic calls of descriptors;
  • object.__getattribute__() and type.__getattribute__() make different calls to __get__();
  • Data descriptors always override the behavior of the instance dictionary;
  • Descriptors of nondata can be overridden by the instance dictionary;

The object super() has a custom implementation of __getattribute__() to call descriptors. The call super(B, obj) will seek in obj.__class__.__mro__ the base class A which immediately follows B and then returns A.__dict__['m'].__get__(obj, B). If not a heading, m will be returned without modifications; if not in the dictionary, m will revert to a search on object.__getattribute__().

Example

The following code defines a class whose objects will be data descriptors that display a message when your value is returned or set. Overwrite the method __getattribute__ would be an interesting approach to implement the same logic for all fields, however, with the descriptor, it will be possible to define which fields will be monitored.

class RevealAccess(object):
    """A data descriptor that sets and returns values
       normally and prints a message logging their access.
    """

    def __init__(self, initval=None, name='var'):
        self.val = initval
        self.name = name

    def __get__(self, obj, objtype):
        print('Retrieving', self.name)
        return self.val

    def __set__(self, obj, val):
        print('Updating', self.name)
        self.val = val

class MyClass(object):
    x = RevealAccess(10, 'var "x"')
    y = 5

So we do:

>>> m = MyClass()
>>> m.x
Retrieving var "x"
10
>>> m.x = 20
Updating var "x"
>>> m.x
Retrieving var "x"
20
>>> m.y
5

See working on Repl.it | Ideone | Github GIST

Estates

The easiest way to implement the descriptor in Python is to use a property.

property(fget=None, fset=None, fdel=None, doc=None) -> property attribute

The documentation itself gives an example of how to do:

class C(object):
    def getx(self): return self.__x
    def setx(self, value): self.__x = value
    def delx(self): del self.__x
    x = property(getx, setx, delx, "I'm the 'x' property.")

Even, the equivalent implementation of the pure Python property is exactly the implementation of a data descriptor:

class Property(object):
    "Emulate PyProperty_Type() in Objects/descrobject.c"

    def __init__(self, fget=None, fset=None, fdel=None, doc=None):
        self.fget = fget
        self.fset = fset
        self.fdel = fdel
        if doc is None and fget is not None:
            doc = fget.__doc__
        self.__doc__ = doc

    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        if self.fget is None:
            raise AttributeError("unreadable attribute")
        return self.fget(obj)

    def __set__(self, obj, value):
        if self.fset is None:
            raise AttributeError("can't set attribute")
        self.fset(obj, value)

    def __delete__(self, obj):
        if self.fdel is None:
            raise AttributeError("can't delete attribute")
        self.fdel(obj)

    def getter(self, fget):
        return type(self)(fget, self.fset, self.fdel, self.__doc__)

    def setter(self, fset):
        return type(self)(self.fget, fset, self.fdel, self.__doc__)

    def deleter(self, fdel):
        return type(self)(self.fget, self.fset, fdel, self.__doc__)

Functions and methods

Class dictionaries store references to methods as functions. In defining a class, methods are defined using def or lambda, which are the tools for creating functions. Methods only differ from normal functions because of their first argument, which is unique to the class instance object.

To support method call, functions define the method __get__() to connect the method to the instance during the call. This means that all functions are descriptors of nondata that return a linked method when called from an object.

In pure Python, the function would be something like:

class Function(object):
    . . .
    def __get__(self, obj, objtype=None):
        "Simulate func_descr_get() in Objects/funcobject.c"
        if obj is None:
            return self
        return types.MethodType(self, obj)

Static and class methods

Descriptors of nondata define a simple mechanism for variations in the usual patterns that link functions to methods. For example, descriptors of nondata convert the call obj.f(*args) in f(obj, *args) and klass.f(*args) in f(*args).

The table below shows all the transformations for the various cases:

inserir a descrição da imagem aqui

Frame: exemplifies the transformations of a function, a static method, and a class method when invoked from an object and a class.

Static methods return the function itself without modifications. Calling both c.f or C.f the equivalent will be object.__getattribute__(c, 'f') or object.__getattribute__(C, 'f'). As a result, it makes no difference to invoke a static method of an object or class.

The pure Python equivalent implementation would be:

class StaticMethod(object):
    "Emulate PyStaticMethod_Type() in Objects/funcobject.c"

    def __init__(self, f):
        self.f = f

    def __get__(self, obj, objtype=None):
        return self.f

The difference for a class method is that it refers to the class reference as a parameter, but it also makes no difference when invoking it from an object or a class. The pure Python equivalent implementation would be:

class ClassMethod(object):
    "Emulate PyClassMethod_Type() in Objects/funcobject.c"

    def __init__(self, f):
        self.f = f

    def __get__(self, obj, klass=None):
        if klass is None:
            klass = type(obj)
        def newfunc(*args):
            return self.f(klass, *args)
        return newfunc

Final considerations

The descriptors protocol consists basically of defining a class to control access to the attributes of another, but my question is, would that be the actual functionality of a descriptor?

Yes, with the descriptor protocol, you transfer the responsibility of business rules over one field to another class. This way, you keep the original class focused on your own responsibility and leave details about your fields for a specific class.

But, note that, as it is a protocol of general use, its use has no limitation. The Python language defines all functions as being descriptors; methods are descriptors, static and class methods are descriptors; properties are descriptors; and it goes beyond.

An application, for example, would be for an age field, which makes no sense to have negative values, so one can define a descriptor for this field:

class Pessoa:
    @property
    def idade(self):
        return self._idade

    @idade.setter
    def idade(self, value):
        if value < 0:
            raise ValueError('Idade não pode ser negativa')
        self._idade = value

See working on Repl.it | Ideone | Github GIST

Basically we create a data descriptor that will validate the value before assigning it to the field. It would be the same as doing:

class Idade(object):
    def __init__(self, value=None):
        self.value = value

    def __get__(self, obj, objtype):
        return self.value

    def __set__(self, obj, val):
        if val < 0:
            raise ValueError
        self.value = val

class Pessoa(object):
    idade = Idade(30)

See working on Repl.it | Ideone | Github GIST

But what would be the advantage of implementing a class rather than just defining a property? Code reuse. Instead of defining the descriptor Idade, could be defined the descriptor Positive, validating negative numbers and using them, for example, in data of a date, day, month and year, which are also not negative (usually).


1: If there is a person you need to follow (not physically) to learn Python, Raymond Hettinger is that person.

2: I did not find a faithful translation for non-data, so I left nondata.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.