Is a Python dictionary as efficient as a tree or a hash?

Question

Is a Python dictionary as efficient as a tree or a hash?

Asked 6 years, 2 months ago

Viewed 200 times

6

I use a lot of Python dictionaries:

dic = dict() #ou
dic = {}

dic['113654'] = {'nome': 'João', 'idade': 21}
dic['853872'] = {'nome': 'Maria', 'idade': 27}
dic['427845'] = {'nome': 'Fernando', 'idade': 33}
dic['720720'] = {'nome': 'Lucas', 'idade': 16}

They make it very easy to check if a key exists and take its value:

cod = '853872'
if cod in dic:
    return dic[cod]

My question is: is using dictionaries that way efficient? Python dictionaries implement an efficient way to find keys (hash, tree, etc.)? Or when my dictionary gets too big you should use a more traditionally efficient data structure like a tree or a hash for example?

1 answer

Browser other questions tagged python python-3.x dictionary

You are not signed in. Login or sign up in order to post.

by Maniero • **444,682** points · Answer 1 · 2019-05-29T22:23:43+00:00

7

The internal implementation of the Python dictionary is using tables hash, then has complexity O(1) to find keys, this can be confirmed in official Python wiki (thanks to Alexciuffa).

It is, in theory, and in almost all situations, faster than a tree that has O(log n) complexity, but it cannot have the keys naturally classified as a tree can, and cannot have repeated keys. A dictionary was not "ordained". According to the jsbueno below in comment it can now be ordered using an auxiliary structure, so it keeps O(1) for access to the element and manages to traverse all elements in the order they were inserted, at the cost of taking up more space that could, Which for Python is not a big deal because I’ve done this before. It is still unclassified and could only be so efficiently with a tree or something like that.

Sources that support this answer (and a reference to those who want to go deeper in this subject) are: https://wiki.python.org/moin/TimeComplexity (dictionary search complexity and other structures) and https://wiki.python.org/moin/DictionaryKeys (shows how the dictionary works and how it uses hash).

– AlexCiuffa

2019/05/29 at 22:33
If it is basically a hash already solves my problem. Thank you very much!

– Jônatas Trabuco Belotti

2019/05/29 at 22:42
@Alexciuffa excellent links!

– Jônatas Trabuco Belotti

2019/05/29 at 22:43
1

About not being ordered - the Python dictionaries code was refactored in version 3.6, and from 3.7 it was made official that dictionaries respect the key insertion order - (like Collections.Ordereddict, which now goes to the museum)

– jsbueno

2019/05/31 at 15:26
@jsbueno so that means he is no longer a hash and yes a tree or something similar, and therefore no longer has to be O(1), can be O(logN) or O(N), which would be terrible and I doubt that it is, or even to be O(1) would have two structures, one to ensure order and one to give to O(1), which would occupy a lot of space.

– Maniero

2019/05/31 at 15:29
is a hash, but auxiliary structures, parallel to hash records, allow the insertion order to be retrieved. I didn’t see the code in C as it was, but it should be very interesting to have a look. The insertion and lookup remain O(1). There’s some extra logic in removing keys - so I don’t know if you’ve changed the time.

– jsbueno

2019/05/31 at 15:32
@jsbueno So you’re confirming that there’s an extra structure to maintain order and take up a lot more space? You have references to all this?

– Maniero

2019/05/31 at 15:34
"Too much space" in the second structure can be as compact as a link to the next hash in each input - 8 bytes per record more - is not so much, compared to the typical size of a Python object that will be in "value" (from the order of tens of bytes anyway).

– jsbueno

2019/05/31 at 15:34
It’s really hard to find the reference - the interesting thing is that even with this extra data, as it was in a complete refactoring that created a new layout in memory for the data , even so the new dictionaries are 20% smaller than the old ones, on average - I think the Issue presenting the new code centralizes the discussions: https://bugs.python.org/issue27350 here is the source at "high level" https://morepy.blogspot.com/2015/01/faster-more-memory-efficient-and-morehtml.

– jsbueno

2019/05/31 at 15:39
Yeah, Python already has a big space waste of course so it won’t make so much difference and if there was already waste for another reason that now solved it can get comparatively interesting, although it’s not as efficient as possible if you disregard the specific comparison. Must be what in C# is KeydCollection https://answall.com/q/56001/101.

– Maniero

2019/05/31 at 15:42

Show 5 more comments