As seen in question you quoted, what hash tables do is calculate the hash value of each key, so you know which table position the element will be in.
In the case of Python, objects can implement the method __hash__
to return your respective hash code. E according to the documentation, an object is hashable if "the value of your hash does not change during your lifetime, and can be compared with other objects" (in short, in addition to the method __hash__
, it also needs to implement the __eq__
).
The documentation also states that objects that are considered equal must return the same hash value. So usually what immutable types do is they return the hash based on its value. And since they are immutable, the value does not change, and therefore the hash also does not.
This is important because if the hash keeps changing, the object will no longer serve as dictionary keys, since they use the hash value internally, to do the lookup. Take this example:
class Teste:
def __init__(self, value):
self.value = value
def __hash__(self):
# implementação simples, retorna o hash do valor
h = hash(self.value)
print(f'calculando hash de {self.value}={h}')
return h
def __eq__(self, other):
return self.value == other.value
def __repr__(self):
return f'Test({self.value})'
t1 = Teste(1)
t2 = Teste('xyz')
d = {}
print('adicionar 1')
d[t1] = 'abc'
print('adicionar xyz')
d[t2] = 2
print(d) # {Test(1): 'abc', Test(xyz): 2}
print('acessando uma chave')
print(d[t1]) # abc
# mesmo sendo outro objeto, se o hash é o mesmo, ele encontra
print(d[Teste(1)]) # abc
I created a class Teste
that is hashable (implements __hash__
and __eq__
, and equal instances always return the same hash). I used instances of this class as keys to a dictionary, and see how the method __hash__
is called, both to set a value and to recover it. Also note the last line, that even though it is another instance, it finds the same value, since the value of the returned hash is the same.
The exit is:
adicionar 1
calculando hash de 1=1
adicionar xyz
calculando hash de xyz=2899992705165252900
{Test(1): 'abc', Test(xyz): 2}
acessando uma chave
calculando hash de 1=1
abc
calculando hash de 1=1
abc
In the case of the "Xyz" string, the hash can vary with each Python execution, as explained here - but throughout the same process, it remains the same (here also explains this).
But if I change the value
object, consequently its hash code will also change (since the method __hash__
is based on the value of value
):
t1 = Teste(1)
d = {}
print('adicionar 1')
d[t1] = 'abc'
print(d) # {Test(1): 'abc', Test(xyz): 2}
print('acessando uma chave')
print(d[t1]) # abc
# mudando o value, o hash code também muda
t1.value = 2
print(d[t1]) # KeyError
By changing the value
, the hash code value also changes, and when trying to use the object as a key, it gives error:
adicionar 1
calculando hash de 1=1
{Test(1): 'abc'}
acessando uma chave
calculando hash de 1=1
abc
calculando hash de 2=2
KeyError: Test(2)
This is why mutable objects are not good candidate keys, because the hash is usually based on their value (following the rule that equal objects return equal hashes). And if the value can change, therefore the hash also changes, and then it will no longer be possible to locate the respective value in the dictionary, since this uses the hash value for such.
Some languages allow mutable objects to be used as keys (and then you turn around to make sure they don’t change), but Python has chosen to let only objects hashable can be used as keys.
Remember that a tuple is only hashable if its elements are also:
d = {}
# tupla só contém elementos hashable
t = (1, 'abc')
d[t] = 1
# tupla contém uma lista, que não é hashable
t = (1, [2, 3])
d[t] = 2 # TypeError: unhashable type: 'list'
This is common to all languages, since hash table is a more general concept (Data Structures), and what changes is the implementation. In Python, objects that are not hashable They already give error as soon as you try to use them as a key of a dictionary. But Java, for example, is different: all objects have a standard implementation of hashCode
(but which must be overwritten in order to be useful). But the general idea is always the same: you calculate a hash value to find the position of the element in the hash table (and if that value can change, the object is not a good candidate for the key).
More details about hash tables can be seen in this question.
I think it’s worth mentioning the beginning of this answer: "to say that hashes are faster than chained lists is to compare oranges with bananas" - Each data structure has pros and cons, and cases where one is more suitable than the other. Taking advantage, I updated my answer (the previous example with
random
wasn’t very good) :-)– hkotsubo