Can you store hash code in a database?

Asked

Viewed 99 times

0

I was reading some questions here and a doubt has arisen in one of these questions.

If I have an application that I need to store strings in a database to compare later with something, but I don’t need the texts, just need to know if they are equal, wouldn’t it be better if I store the hash code which is short in place of the text? From what I read, correct me if I am wrong, that the switch uses the hash code to select strings so I wanted to do the same thing.

  • I believe that yes, but there is the possibility of conflicts that must be taken into account, I do not have more information to formulate a response, only my opinion =]

  • Did the answer resolve what was in doubt? Do you need something else to be improved? Do you think it is possible to accept it now?

1 answer

3


It cannot. The hash code is not stable, it can use different formulas depending on the version used. Moreover, even within the same application execution there can be a different algorithm if you have more than one AppDomain since each one can run a different version of CLR or even the code of your application. Even if you don’t use AppDomain, still has problem (this no longer exists in . NET Core).

It’s still possible to have two strings different with the same hash code, then you can’t trust it. The switch uses the mechanism of hash code only to improve the selection performance, but then he confirms if it is the same with the own string, which is already much better to buy all strings. To do the same with the database would have to store the string anyway. Since it is a database the performance gain will be derisory, not worth the effort.

If you want to risk collisions, that is, if your case doesn’t matter, the ideal would be to create a hash code own not to depend on the implementation of . NET or other implementation of CLR which can use an algorithm quite different from any other version of . NET since there is no specification of the exact algorithm. At least this way could ensure the stability of the algorithm.

Could store the first characters of string next to the hash code, so it is very unlikely, but not guaranteed, that it has a code collision with the same start. There’s still risk, but he’s pretty low, "almost" the same as using a GUID.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.