What is the relationship between unhashable and mutable?

Asked

Viewed 87 times

7

I always use strings as keys in dictionaries, but, studying the theme, I noticed that it is possible to use any immutable object. So, in addition to strings, dictionaries accept as keys objects such as integers, floats and tuples. All the examples below work:

d={"name":"Lucas", 1:'01', (9,10):3}
type(d) #dict

However, if we define a dictionary using lists as keys, we have the error TypeError: unhashable type: 'list'. I don’t know exactly what is hashable, but I read in this question which hash tables allow efficient searches and, it seems to me, dictionaries are an example of this type of implementation (this would explain why dictionaries are much more efficient for searching than lists). But what is the relationship with an object being hashable and its mutability? Is the restriction of not using mutable objects as dictionary keys something python-specific or common to all languages? Why this restriction?

  • 1

    I think it’s worth mentioning the beginning of this answer: "to say that hashes are faster than chained lists is to compare oranges with bananas" - Each data structure has pros and cons, and cases where one is more suitable than the other. Taking advantage, I updated my answer (the previous example with random wasn’t very good) :-)

2 answers

4


Imagine you have a list:

lista = [1,2,3,4,5,6,7,8,9,10]

If you want to check if an element is present in this list, Python will need to traverse element by element and compare one to one:

if (11 in lista): #Irá comparar 11 com 1, depois 11 com 2, depois 11 com 3...

Only after comparing all the elements in the list can he be sure he hasn’t found the element you want. An example in real life would be to look for a music CD in an album that is not organized. You need to go page by page until you find (or not).

If your list had many elements your search would take longer.

Dictionary, in turn, no matter how many items it contains, takes time on average always the same time to access the contents of a key. A real-life parallel would be you looking for a word in a dictionary. As it is arranged alphabetically, you know exactly where to find the definition (or for example if the CD’s from the previous example had been arranged in some way).

How does he do it? The keys to the dictionary need to be Hashable (with a neologism we could say Hasheable). I will not go into detail about this implementation, as the other answer quoted in the question already does it well. But basically what a Hash function does is:

Alguma coisa -> | Função Hash | -> Inteiro (exemplo: 7036520087640895475)

That is, if you provide something for this function, it calculates an integer based on this provided content. Good Hash functions promise (but do not guarantee) that different contents generate different integers, among other features. This is not always possible, and it is precisely in this case that access can take longer. When two different contents generate the same value, it is said that there was a collision.

When you add a key-value pair to the dictionary, basically what it does is:

  • Calculates the hash of the key you provided. Example: ("Blue" key generated the 123 hash)
  • Stores the value at a memory position linked to this 123 hash.

(If it is more didactic, or easier to understand, one might think that each dictionary has a list, and according to the above example saved at position 123 of that list. Although the true implementation is not quite so because it would have many "empty spaces".)

If on a future occasion you want to access the content of a key, you do not need to go through all the elements of the dictionary. You can simply recalculate the Hash, go into that memory position and check the item that is there. Hence access (and check if it contains) takes constant time regardless of the size of the dictionary. It will straightforward in the memory position that the object is.

But what relation to an object is hashable and its mutability?

What happens if the contents of this object change? Your hash would change too, and it would be impossible to locate it in the dictionary again. So only immutable objects can guarantee a hash for you.

The restriction of not using mutable objects as dictionary keys is something python-specific or common to all languages?

By definition, it is common to all languages. What may happen that some languages do, is to use as content for the hash some ID of the object in question. For example, a pointer to the specific object or some number that uniquely defines that object. Many languages allow you to define your own hash function as well.

For another explanation, see this question (in English).

  • 2

    Good answer! Just to be pedantic, the only detail I would change is "it always takes the even time to access the contents of a key", since there is no function of hash perfect (i.e. preventing any kind of collision). I think, on average, time tends to be constant, although this is more probabilistic than certain.

  • 1

    "Good Hash functions promise that different contents generate different integers" - No, a hash function should ensure that equal content always generates the same value, and different content may or may not return the same. Of course, to be useful, the ideal is that it is different and the collisions are not frequent, but they can exist yes (example). And precisely because it can have collisions that nay it always takes the same time to access the contents of a key, because the same slot may end up having more than one value as explained here

  • So I used the word promising. A promise can be fulfilled, or not. Anyway, I will edit the answer to be more didactic. As this was not the main objective of the question, I did not focus much on this part. I will also add that constant access time is only in the average case. Thanks for the suggestions!

3

As seen in question you quoted, what hash tables do is calculate the hash value of each key, so you know which table position the element will be in.

In the case of Python, objects can implement the method __hash__ to return your respective hash code. E according to the documentation, an object is hashable if "the value of your hash does not change during your lifetime, and can be compared with other objects" (in short, in addition to the method __hash__, it also needs to implement the __eq__).

The documentation also states that objects that are considered equal must return the same hash value. So usually what immutable types do is they return the hash based on its value. And since they are immutable, the value does not change, and therefore the hash also does not.

This is important because if the hash keeps changing, the object will no longer serve as dictionary keys, since they use the hash value internally, to do the lookup. Take this example:

class Teste:
    def __init__(self, value):
        self.value = value

    def __hash__(self):
        # implementação simples, retorna o hash do valor
        h = hash(self.value)
        print(f'calculando hash de {self.value}={h}')
        return h

    def __eq__(self, other):
        return self.value == other.value

    def __repr__(self):
        return f'Test({self.value})'

t1 = Teste(1)
t2 = Teste('xyz')
d = {}
print('adicionar 1')
d[t1] = 'abc'

print('adicionar xyz')
d[t2] = 2

print(d) # {Test(1): 'abc', Test(xyz): 2}
print('acessando uma chave')
print(d[t1]) # abc
# mesmo sendo outro objeto, se o hash é o mesmo, ele encontra
print(d[Teste(1)]) # abc

I created a class Teste that is hashable (implements __hash__ and __eq__, and equal instances always return the same hash). I used instances of this class as keys to a dictionary, and see how the method __hash__ is called, both to set a value and to recover it. Also note the last line, that even though it is another instance, it finds the same value, since the value of the returned hash is the same.

The exit is:

adicionar 1
calculando hash de 1=1
adicionar xyz
calculando hash de xyz=2899992705165252900
{Test(1): 'abc', Test(xyz): 2}
acessando uma chave
calculando hash de 1=1
abc
calculando hash de 1=1
abc

In the case of the "Xyz" string, the hash can vary with each Python execution, as explained here - but throughout the same process, it remains the same (here also explains this).


But if I change the value object, consequently its hash code will also change (since the method __hash__ is based on the value of value):

t1 = Teste(1)
d = {}
print('adicionar 1')
d[t1] = 'abc'

print(d) # {Test(1): 'abc', Test(xyz): 2}
print('acessando uma chave')
print(d[t1]) # abc

# mudando o value, o hash code também muda
t1.value = 2
print(d[t1]) # KeyError

By changing the value, the hash code value also changes, and when trying to use the object as a key, it gives error:

adicionar 1
calculando hash de 1=1
{Test(1): 'abc'}
acessando uma chave
calculando hash de 1=1
abc
calculando hash de 2=2

KeyError: Test(2)

This is why mutable objects are not good candidate keys, because the hash is usually based on their value (following the rule that equal objects return equal hashes). And if the value can change, therefore the hash also changes, and then it will no longer be possible to locate the respective value in the dictionary, since this uses the hash value for such.

Some languages allow mutable objects to be used as keys (and then you turn around to make sure they don’t change), but Python has chosen to let only objects hashable can be used as keys.

Remember that a tuple is only hashable if its elements are also:

d = {}

# tupla só contém elementos hashable
t = (1, 'abc')
d[t] = 1

# tupla contém uma lista, que não é hashable
t = (1, [2, 3])
d[t] = 2 # TypeError: unhashable type: 'list'

This is common to all languages, since hash table is a more general concept (Data Structures), and what changes is the implementation. In Python, objects that are not hashable They already give error as soon as you try to use them as a key of a dictionary. But Java, for example, is different: all objects have a standard implementation of hashCode (but which must be overwritten in order to be useful). But the general idea is always the same: you calculate a hash value to find the position of the element in the hash table (and if that value can change, the object is not a good candidate for the key).

More details about hash tables can be seen in this question.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.