Nested data class built from a dictionary

Question

Nested data class built from a dictionary

Asked 5 years, 8 months ago

Viewed 245 times

4

Consider the following: implementation:

from abc import ABC

# Base
class DictModel(ABC):
    def __init__(self, model=None):
        if model:
            for k, v in model.items():
                if isinstance(v,dict):
                    setattr(self, k, DictModel(v))
                elif isinstance(v,list):
                    setattr(self, k, [DictModel(i) if isinstance(i,dict) else i for i in v])
                else:
                    setattr(self, k, v)

# Derivada
class MyDataClass(DictModel):
    def __init__(self,model):
        DictModel.__init__(self,model)

Such an implementation allows me to build data classes with nested structures from an arbitrary dictionary, for example:

modelo = {
    "foo": 123,
    "bar": "xpto",
    "numbers": {
        "primes": [2, 3, 5, 7],
        "odd": [{"n" : 1}, {"n" : 3}, {"n" : 5}, {"n" : 7}],
        "even": {"a" : 2, "b" : 4, "c" : 6, "d" : 8}
    },
    "constants" : {
        "pi": 3.1415,
        "e": 2.7182,
        "golden": 1.6180,
        "sqrt2": 1.4142
    }
}

obj = MyDataClass(modelo)

print(obj.foo)                 # 123
print(obj.bar)                 # xpto
print(obj.numbers.primes)      # [2, 3, 5 ,7]
print(obj.numbers.odd[0].n)    # 1
print(obj.numbers.odd[1].n)    # 3
print(obj.numbers.odd[2].n)    # 5
print(obj.numbers.odd[3].n)    # 7
print(obj.numbers.even.a)      # 2
print(obj.numbers.even.b)      # 4
print(obj.numbers.even.c)      # 6
print(obj.numbers.even.d)      # 8
print(obj.constants.pi)        # 3.1415
print(obj.constants.e)         # 2.7182
print(obj.constants.golden)    # 1.618
print(obj.constants.sqrt2)     # 1.4142

For a brief moment, while reviewing the code above, I felt reinventing the wheel.

Is there any other way to build Data Classes with nested attributes from any dictionary? Is there any way standard to do such a thing?

2 answers

5

There is no standard way to do this in language - and of course it can be useful in many cases.

updating: A cool, modern, production-ready design that does this kind of thing (but needs to define the data schematics first) is the Pydantic

I have a dead project that does this sort of thing, and, coincidence or not, someone just sent an email on the Python-dev list with a project for about the same thing - asking about inclusion in the language.

Guido replied, although he did not have the final word, that hardly a data structure that maps keys to attributes would be considered for inclusion - mainly due to the problem of key collision with method names. (It includes this information here to illustrate that in fact there is no such thing in the language, nor is there any widely used library that does this, despite its usefulness). Draft submitted: https://git.cinege.com/thesaurus/ ; guido’s answer: https://mail.python.org/archives/list/[email protected]/message/UHO7UEJUKFXFYFOBIEAX6AI4DOSGYARQ/ ; My project (which does this, but that would be a small stop of his scope, if it went forward): https://github.com/jsbueno/singularity .

After having illustrated the question with the points above, we go to your proposal: in my view, you are approaching this construction somewhat "pedestrian" - manually creating the attributes sent in a dictionary, and recursively instances of the same class if the content of the attributes is another dictionary.

It is not necessarily bad to do so - depending on the size of the project,and the level of the developers that should interact with it, and also the need for performance in terms of CPU and memory (which would hardly be a concern for a Python implementation of a first version). That is: it is a simple way to do, direct, anyone, even if a developer who knows better other languages and is only a casual Python user will be able to tap his eye and understand what is being done.

That said, Python has far more interesting ways of doing something like this! Mainly because it is possible to customize access to attributes of an object - so for example, instead of manually playing attribute by attribute of your object every time it is created, you can simply store the dictionary itself in an internal attribute, and customize the method __getattribute__ to perform recursive searches on these dictionaries.

If you only consider reading, a legal implementation is short enough to write here:

from collections.abc import Sequence, Mapping

# Base
class DictModel:
    def __init__(self, model=None):
        if not model:
            model = {}
        self._data = model

    def _wrap(self, element):
        if isinstance(element, (Sequence, Mapping)) and not isinstance(element, str):
            return DictModel(element)
        return element

    def _innerget(self, current_element, path, depth=0):
        if not path:
            return self._wrap(current_element)
        if path[0].startswith("["):
            if not isinstance(current_element, Sequence):
                raise ValueError(f"Element at position {depth - 1} of the path is not a sequence")
            index = int(path[0].strip("[]"))
            element = current_element[index]
            path = path[1:]
        elif "[" in path:
            bracket_position = path[0].find("[")
            element = current_element[path[0][:bracket_position]]
            path = path[0][bracket_position:] + path[1:]
        else:
            element = current_element[path[0]]
            path = path[1:]
        return self._innerget(element, path, depth + 1)

    def __setattr__(self, attr, item):
        # caso não tratado: se o último elemento do caminho estiver em uma lista, diretamente.
        # é só extender os "if" aqui para checar se o último elemento tem "[]"
        if attr.startswith("_"):
            return super().__setattr__(attr, item)
        if "." in attr:
            path, attr = attr.rsplit(".", 1)
            parent = getattr(self, path)
            parent[attr] = item
        self._data[attr] = item

    def __getattribute__(self, attr):
        if attr.startswith("_"):
            return super().__getattribute__(attr)
        path = attr.split(".")

        return self._innerget(self._data, path)

    def __getitem__(self, index):
        if not isinstance(self._data, Sequence):
            raise ValueError("For non-sequnce components, please use attribute notation")
        return self._wrap(self._data[index])

    def __repr__(self):
        return f"DictModel <{self._data}>"

And playing a little bit with that in the interactive interpreter:

In [25]: d = DictModel(modelo)                                                                                                                      

In [26]: d.numbers                                                                                                                                  
Out[26]: DictModel <{'primes': [2, 3, 5, 7], 'odd': [{'n': 1}, {'n': 3}, {'n': 5}, {'n': 7}], 'even': {'a': 2, 'b': 4, 'c': 6, 'd': 8}}>

In [27]: d.numbers.primes                                                                                                                           
Out[27]: DictModel <[2, 3, 5, 7]>

In [28]: d.numbers.primes[0]                                                                                                                        
Out[28]: 2

In [29]: d.numbers.odd[1]                                                                                                                           
Out[29]: DictModel <{'n': 3}>

In [30]: d.numbers.odd[1].n = 11                                                                                                                    

In [31]: d                                                                                                                                          
Out[31]: DictModel <{'foo': 123, 'bar': 'xpto', 'numbers': {'primes': [2, 3, 5, 7], 'odd': [{'n': 1}, {'n': 11}, {'n': 5}, {'n': 7}], 'even': {'a': 2, 'b': 4, 'c': 6, 'd': 8}}, 'constants': {'pi': 3.1415, 'e': 2.7182, 'golden': 1.618, 'sqrt2': 1.4142}}>

This implementation uses some of the characteristics of the language in its favor - the fact that dictionaries will be a single instance - and will be shared between the root object and other "Dictmodel" created dynamically, for example - then an automatically derived object change will be replicated in the root object.

The use of recursiveness, but without the academicist purism: I pass auxiliary data to the recursive function to be able to "find itself", like "Depth", and I have a "public" input method - the __getitem__: who will use the class need not worry about setting the data to the recursive function.

Another result of this model is that even for a large model, I will only have one instance of this class in memory - the other instances are created on demand when accessing a "branch" of the data tree.

Something interesting there is also how to call the "super" to get to the original implementations of the special methods language - without it the object would not work. In this case, I leave out of the data structure maintained any attribute that starts with "_" - and let Python take care of these attributes - hence just keep my attributes necessary to work with this name pattern.

The __repr__, sure, it can be redone to make it cooler - but it’s functional.

(by the way, I ended up implementing the writing part of attributes too - it was simple)

1

The instantiation is much lighter and simpler is right, but this will not weigh in terms of performance every access made in the object ?

– Isac

2019/11/19 at 11:38
It can - but it won’t weigh much more than normal accesses made in a deep structure of dictionaries - if you’re in a loop that needs to access attributes in such a structure, it’s best to keep references to the level that will access in a local variable - That is, instead of, every time accessing mundo.regiao.objeto.peca.posicao.x, Voce guard posicao in a local variable, and accesses posicao.x. Then, if it is the case of "use in production" and this time still makes a difference, can put a special case in the __getitem__ to make a cache layer.

– jsbueno

2019/11/19 at 13:20
I realized now that this algorithm was "Overkill" - it is good to access the direct object in mundo["regiao.objeto.peca.posicao.x"] (and would have to be moved to the __getitem__) - and then he picks up the "x" in a single pass. No __getattribute__ it will be called once for each component, and create a new object for each component itself, (but also it can create the wrapper DictModel at that time and persist it, instead of creating when the attributes are set - so only the attributes that are accessed will have an instance of their own DictModel

– jsbueno

2019/11/19 at 13:24
I added a reference to "pydantic" in the reply - a production-quality design that brings some of the facilities you may be wanting with this.

– jsbueno

2019/11/19 at 13:31
@jsbueno Accepted! - Again, thanks for the references and ideas, a real lesson!

– Lacobus

2019/11/23 at 14:14

Browser other questions tagged python python-3.x

You are not signed in. Login or sign up in order to post.

by Lacobus • **13,510** points · Answer 1 · 2020-08-28T02:49:01+00:00

Foolishly, just strolling through the documentation of the standard Python library I discovered that since version 3.3, the module types provides a utility class called SimpleNamespace that is capable of building a Namespace from an arbitrary list of named parameters.

Bang! It made me think about the possibility of applying the ** (operador de desempacotamento) in a dictionary in order to pass it as parameter to the constructor of that class! I quickly implemented a basic program as proof of concept:

from types import SimpleNamespace

modelo = {
    "foo": 123,
    "bar": "xpto",
    "numbers": {
        "primes": [2, 3, 5, 7],
        "odd": [{"n" : 1}, {"n" : 3}, {"n" : 5}, {"n" : 7}],
        "even": {"a" : 2, "b" : 4, "c" : 6, "d" : 8}
    },
    "constants" : {
        "pi": 3.1415,
        "e": 2.7182,
        "golden": 1.6180,
        "sqrt2": 1.4142
    }
}

print(SimpleNamespace(**modelo))

Exit:

namespace(bar='xpto', constants={'pi': 3.1415, 'e': 2.7182, 'golden': 1.618,
'sqrt2': 1.4142}, foo=123, numbers={'primes': [2, 3, 5, 7], 'odd': [{'n': 1},
{'n': 3}, {'n': 5}, {'n': 7}], 'even': {'a': 2, 'b': 4, 'c': 6, 'd': 8}})

Analyzing the exit, I easily realized that there was still something missing in this approach, the thing worked partially, converting only the first level of the input dictionary to a Namespace... I needed something recursive to solve cases where the input dictionary was "nested".

After some time taking slap of exceptions on my way, I arrived against the following code:

from types import SimpleNamespace

model = {
    "foo": 123,
    "bar": "xpto",
    "numbers": {
        "primes": [2, 3, 5, 7],
        "odd": [{"n" : 1}, {"n" : 3}, {"n" : 5}, {"n" : 7}],
        "even": {"a" : 2, "b" : 4, "c" : 6, "d" : 8}
    },
    "constants" : {
        "pi": 3.1415,
        "e": 2.7182,
        "golden": 1.6180,
        "sqrt2": 1.4142
    }
}

def DictModel(obj):
    if isinstance(obj, dict):
        ns = SimpleNamespace(**obj)
        for k, v in ns.__dict__.items():
            ns.__dict__[k] = DictModel(v)
        return ns
    elif isinstance(obj, list):
        return [DictModel(i) for i in obj]
    elif isinstance(obj, list):
        return tuples(DictModel(i) for i in obj)
    return obj


print(DictModel(model))

Exit:

namespace(bar='xpto', constants=namespace(e=2.7182, golden=1.618, pi=3.1415,
sqrt2=1.4142), foo=123, numbers=namespace(even=namespace(a=2, b=4, c=6, d=8),
odd=[namespace(n=1), namespace(n=3), namespace(n=5), namespace(n=7)], 
primes=[2, 3, 5, 7]))

Eureka! That’s just what I needed! A thing is able to create Namespaces recursively! With a little more effort, I made my puppy remember an iteration technique in dictionaries using the library json, that was something more or less thus:

import json

def callback(d):
    print(d)
    return d

def iterator(dic, clbk):
    return json.loads(json.dumps(dic), object_hook=clbk)

modelo = {
    "foo": 123,
    "bar": "xpto",
    "numbers": {
        "primes": [2, 3, 5, 7],
        "odd": [{"n" : 1}, {"n" : 3}, {"n" : 5}, {"n" : 7}],
        "even": {"a" : 2, "b" : 4, "c" : 6, "d" : 8}
    },
    "constants" : {
        "pi": 3.1415,
        "e": 2.7182,
        "golden": 1.6180,
        "sqrt2": 1.4142
    }
}

iterator(modelo, callback)

Exit:

{'n': 1}
{'n': 3}
{'n': 5}
{'n': 7}
{'a': 2, 'b': 4, 'c': 6, 'd': 8}
{'primes': [2, 3, 5, 7], 'odd': [{'n': 1}, {'n': 3}, {'n': 5}, {'n': 7}], 'even': {'a': 2, 'b': 4, 'c': 6, 'd': 8}}
{'pi': 3.1415, 'e': 2.7182, 'golden': 1.618, 'sqrt2': 1.4142}
{'foo': 123, 'bar': 'xpto', 'numbers': {'primes': [2, 3, 5, 7], 'odd': [{'n': 1}, {'n': 3}, {'n': 5}, {'n': 7}], 'even': {'a': 2, 'b': 4, 'c': 6, 'd': 8}}, 'constants': {'pi': 3.1415, 'e': 2.7182, 'golden': 1.618, 'sqrt2': 1.4142}}

Combining all this, I arrived in following code:

from types import SimpleNamespace
import json

def DictModel(**kwargs):
    return json.loads(json.dumps(kwargs),
        object_hook=lambda o: SimpleNamespace(**o))

modelo = {
    "foo": 123,
    "bar": "xpto",
    "numbers": {
        "primes": [2, 3, 5, 7],
        "odd": [{"n" : 1}, {"n" : 3}, {"n" : 5}, {"n" : 7}],
        "even": {"a" : 2, "b" : 4, "c" : 6, "d" : 8}
    },
    "constants" : {
        "pi": 3.1415,
        "e": 2.7182,
        "golden": 1.6180,
        "sqrt2": 1.4142
    }
}

obj = DictModel(**model)

print(obj.foo)                 # 123
print(obj.bar)                 # xpto
print(obj.numbers.primes)      # [2, 3, 5 ,7]
print(obj.numbers.odd[0].n)    # 1
print(obj.numbers.odd[1].n)    # 3
print(obj.numbers.odd[2].n)    # 5
print(obj.numbers.odd[3].n)    # 7
print(obj.numbers.even.a)      # 2
print(obj.numbers.even.b)      # 4
print(obj.numbers.even.c)      # 6
print(obj.numbers.even.d)      # 8
print(obj.constants.pi)        # 3.1415
print(obj.constants.e)         # 2.7182
print(obj.constants.golden)    # 1.618
print(obj.constants.sqrt2)     # 1.4142

Exit:

123
xpto
[2, 3, 5, 7]
1
3
5
7
2
4
6
8
3.1415
2.7182
1.618
1.4142

And surprisingly, with only 3 lines, the thing solved the problem in a standard and elegant way, removing a flea that had been living behind my ear for months and further increasing my passion for language.

Although apparently identical, the two solutions presented have differences, advantages and disadvantages.

The first solution uses recursiveness to iterate on the input dictionary keys and values. In some spheres recursive solutions are a thing of the devil and should be avoided to the maximum. This type of implementation limits the depth of reach of the iterator within the dictionary tree, allowing the release of a type exception RecursionError. In my case, the intention is to work with relatively small dictionaries, with few levels of depth, which makes this hypothesis very remote.

The great advantage of this first implementation is the possibility of treating independently any type of data contained in the structure of the input dictionary, where the data type can be identified with the function isinstance() and treated in a customized way as required.

In the solution where the module json is used to encode the input dictionary in format JSON to immediately decode it, only in a personalized way, taking advantage of the parameter object_hook of the coding function json.loads().

The main disadvantages of this technique are: 1) Performance. All this data manipulation in format JSON is not at all efficient compared to the recursive version of the function; 2) Integrity. Although they have a very similar structure, a dictionary is not the same as an object JSON. For example, when coding a dictionary for the format JSON through the function json.dumps(), as lists as tuples are interpreted in the same way and are converted to a array, This makes it impossible to faithfully reconstruct the original dictionary.