How to optimize my access to the Python dictionary?

Question

How to optimize my access to the Python dictionary?

Asked 7 years, 5 months ago

Viewed 367 times

5

I have a dictionary of the form sequinte:

dicionario = {'1':'Banana','7':'Maçã','3':'Pera','2':'Melancia'}

If I check the key '2' in the dictionary: if '2' in dicionario:. My programmer intuition tells me that the complexity of this command is O(n) in the worst case, because there are n elements within that dictionary. But this is a dictionary and not a list and I don’t know exactly how the "magic gears" of the python dictionary work.

The complexity of performing a search on a given dictionary item whether by looking for the key as I did or accessing the direct element: dicionario['2'] is complex O(n)? Either there is a Hash within these "gears" or some structure that can optimize this search, such as binary search?

Speaking of binary search, is there any way to access the element of a dictionary as I did and activate a binary search in the dictionary’s "gears" considering, of course, that it is ordered?

1

The dictionary is implemented as hash table and the direct access to complexity is O(1). See https://docs.python.org/2/faq/design.html#how-are-Dictionaries-implemented

– bfavaretto

2018/01/24 at 17:30
put the link to the Python 3 document. Google is the last place in the world that gets these links to the Python 2 documentation yet - only it will only get better if the links we put on the internet go to version 3.

– jsbueno

2018/01/24 at 20:19
https://docs.python.org/3/faq/design.html#how-are-Dictionaries-implemented

– jsbueno

2018/01/24 at 20:19

2 answers

7

As far as I know the two ways to access a key in the dictionary are basically equivalent (of course one returns a boolean of existence and the other returns the value, but both get time O(1), in most cases (has extreme cases that can be O(n)but that do not even come close to occur)).

Dictionaries are scattering tables, they have almost the performance of array for random access, only loses a little because you need to calculate the code hash before accessing the element. There are cases of key collision where the search needs to be done in another way, and can even reach complexity O(n).

A dictionary is almost the opposite of a structure that allows binary search. A dictionary cannot be either ordered (with specific rule), much less ranked (with specific criterion).

In the example I have doubts whether a array wouldn’t be the best choice.

Both the array,how complex the dictionary will be O(n) to look for values.

I have already demonstrated that the dictionary is different from array.

May be useful: Objects are similar to arrays?.

Browser other questions tagged python dictionary

You are not signed in. Login or sign up in order to post.

by jsbueno • **30,668** points · Answer 1 · 2018-01-24T22:49:02+00:00

As in the comments and the other answer, Python dictionaries are a hash table, with search time O(1) - and the information about it is here: https://docs.python.org/3/faq/design.html#how-are-Dictionaries-implemented

That said, about the code design you ask - chave in dict vs dict[chave]: access time is the same - only if the key does not exist, the second form raises an exception.

When you go back a value that is not sure if it exists in the dictionary the recommended is to use the method .get:

meu_valor = dicionario.get("chave")

In this case, if "key" does not exist, the .get returns None and life continues. The .get also accepts a second parameter which is returned when the key does not exist, instead of None.

This saves lines of code - like

if chave in dicionario:
    valor = dicionario[chave]
else: 
    valor = None

or

try:
   valor = dicionario[chave]
except KeyError:
   valor = None

Of these, this second case would be the only one that would have some difference in performance, due to the time of creation of the context of Try/except . See also the method .setdefault of dicts: they not only retrieve a default value if there is no - but in this case, assign the new value. For example, a loop to create a dictionary with all the words of a text and the position of each word within it, in a list, could be:

palavras = texto.split()
posicoes = {}
for indice, palavra in enumerate(palavras):
    posicoes.setdefault(palavra, []).append(indice)

In this case, if the word has not yet been found, an empty list is created which is at the same time assigned there and returned by the method. If posicoes[palavra] already exists, the existing list is returned.

Now, since you’re interested in this and want to know about binary search, here’s the interesting fact: Python has a well-defined interface for objects like "Mapping" - and it’s nice to implement an object that behaves exactly like a dictionary, but with the internal algorithms you decide. If you want to create a dictionary that internally is a binary tree, it’s pretty easy - the recommended way to do this is to inherit the class collections.abc.MutableMapping. There are several classes in collections.abc which are the basis for implementing very useful and varied data structures. A binary tree would have search O(log(n)), but with the advantage that you can sort the keys and use Slices in it, for example. Check in: https://docs.python.org/3/library/collections.abc.html

I have several "superdictionaries" implemented in the module extradict - some are over toy, others I even use in production:

https://github.com/jsbueno/extradict