How to convert the hash() value of a string back to the original string?

Asked

Viewed 548 times

1

I’m learning now about Hashing and discovered that Python has a function called hash() that returns the hash value of an object. The problem is, how can I convert this value to my original object, in this case a string?

def encode(obj):
    return hash(obj)

def decode(value):
    return # Como eu decodificaria aqui ?

password_hash = encode("pato345")
password = decode(password_hash)

2 answers

3


There is no way to get the original string from the hash. To understand the reason, let’s see how the hash works.

According to the documentation, hash receives an object (any one, not just strings) and returns a number.

Since there are more possible objects than numbers, there will always be several objects (be strings, be any other things) whose hash values are equal.

Example (running the Python interpreter on my machine):

>>> hash('a')
1844645535655954614
>>> hash(1844645535655954614)
1844645535655954614

Obs: the numbers will be different if you run on your machine, more on that at the end of the answer.

In this case, we have a string and a number that produce the same hash value (I "stole" a little by passing a number on the second call because I was too lazy to look for two different strings that generated the same hash).

You could still have a class that returns that same value:

class Test:
  def __hash__(self):
    return 1844645535655954614

print(hash(Test())) # 1844645535655954614

The above code is not a useful implementation, it is only to show that there can be a class that returns the same hash as a string.

Thus, by having only the hash value, it is not possible to know if it came from a string, a number, or any other class. The hash is not a reversible value. Even two different strings can generate the same hash (the difficulty of finding them depends on the algorithm, of course).


Don’t use this for encryption

According to the documentation, when calling hash(objeto), a call is made internally to objeto.__hash__(). And about strings, it is worth highlighting this excerpt:

By default, the hash() values of str and bytes Objects are "Salted" with an unpredictable Random value. Although they remain Constant Within an individual Python process, they are not predictable between repeated invocations of Python.

This is intended to provide Protection Against a Denial-of-service caused by carefully-Chosen inputs that exploit the Worst case performance of a Dict Insertion, O(n 2) Complexity. See http://www.ocert.org/advisories/ocert-2011-003.html for Details.

In free translation:

By default, for objects str and bytes, is added a salt random to the value of __hash__(). Although this remains constant throughout a Python process, the values are not predictable between repeated Python invocations.

This is intentional to protect against denial of service attacks caused by well-chosen entries that exploit the worst case of insertion performance in dictionaries, which is O(n 2). Behold http://www.ocert.org/advisories/ocert-2011-003.html for more details.

On the concept of salt, suggest this reading

That’s why I said the value of hash('a') that you will get will not necessarily be the same as I got above. With each Python invocation, a salt random that is used in the hash calculation, and the results will be different. This is not a problem for the purpose of the function hash, because the documentation says that "They are used to quickly compare Dictionary Keys During a Dictionary lookup" (are used to compare keys from a dictionary during a key search). In this case, collisions (different objects that have the same hash value) are not necessarily a problem (as long as they are not as frequent) - read more on the subject here.

Anyway, as the value of hash(objeto) can vary with each Python execution, it is not a good use to store the hash of a password, as suggested to another answer.

Let’s assume that the user first registered the password and you saved the hash:

hash_senha = hash(senha_que_usuario_cadastrou)
# gravar hash_senha

A while later the user enters the password and you compare its hash to what was previously recorded:

# obter o hash da senha gravada no passo anterior
hash_senha = ...
# comparar com o hash da senha que o usuário digitou
if hash_senha == hash(senha_digitada):
    # senha ok

But as the value of hash may vary (as already explained above), the if above does not guarantee that the hashes will be equal, even if the password is correct (a simple re-start on the server, for example, will already cause the salt used be another and the user will no longer be able to log in). Or you can give the immense unluckiness of a different password to generate a hash equal to what was recorded in the first step (this is more unlikely, but not impossible).

If you want to store password hashes, I suggest using the module hashlib, that in addition to ensuring that the same string always generates the same hash, it still has more suitable algorithms for this purpose (I won’t go into more detail because I think it is already outside the scope of the question). Finally, if you want something that is reversible, then you don’t want hash algorithms, but encryption.

  • Using the hashlib I’ll have the same problem of equal hash values for different objects ?

  • @Jeanextreme002 The module hashlib only works with bytes - that is, any object passed to the methods of this module has to be converted to bytes, including strings. You will still have collisions, there is no way to get away from it, because the idea of the hash is precisely not to be reversible (it is not a problem in itself, the best algorithms actually only make it difficult to discover these collisions)

0

You can’t. Hash works only one way and it’s not possible to reverse. If you want to reverse you must use encryption.

If you only want to validate passwords, you can hash:

hashed_orig_password = hash(password)

(...)

possible_password = 'abc123'
if hash(possible_password) == hashed_orig_password:
  print('Password coincide')
else:
  print('Password não coincide')

Browser other questions tagged

You are not signed in. Login or sign up in order to post.