Using hashcode as id is good practice?

Asked

Viewed 938 times

10

I have a short list of strings (where the strings will never repeat themselves) and would like to use hashcode as id, is a good practice?

  • 3

    Hashcode can collide. What would be the advantage of not using strings? If they are unique, you can number them in sequence if the problem is space.

  • 1

    Good practice in what sense?

2 answers

9

Depends.

Unique value

It’s not good practice if the idea is to create an ID single.

Hashes are a simplification of content and every simplification loses information because there’s no way you can represent the whole with the part.

This means that it is difficult, but not impossible, for two different strings or objects to generate the same hash value. When that happens, we say there was a collision. A serious implementation should treat this type of situation, otherwise it will be compromising the integrity of the data or, as in the case of the use of hashes in security, up to the privacy of its users.

The most you could do to decrease the size of a String without losing information is compress it, but this would not be very practical.

If you need to create keys from Strings you have basically two options:

  1. Use the Strings themselves. Example:
mapa.put("string1", valor1);
mapa.put("string2", valor2);
  1. Assign a number to Strings, as if it were an id in a table:
mapeamento.put("String1", 1);
mapeamento.put("string2", 2);
...
mapa.put(1, valor1);
mapa.put(2, valor2);

Classification

On the other hand, hashes are good at sorting content, which means you can quickly find what you’re looking for, but not uniquely.

I’ll use the example of HashMap to illustrate this.

Imagine you have a bucket full of colored and numbered polka dots and someone asks you to pick up a 1 blue ball. You can spend a lot of time looking for it and at the end you might not find it. This is the equivalent of searching for items in a list, for example.

On the other hand, if you have several buckets, each with one-colored polka dots, just go to the bucket with the blue polka dot label and look inside. You may still have to look a little for the right number, but it will be much more efficient.

That’s kind of how it works with HashMap. If you look at the implementation of HashMap.put() will see that it uses the hash to access the index of the vector table. In this case, the positions of the vectors are like the buckets in the above example. Each item of table is the guy Entry, whose implementation acts as a chained list (see attribute next). When you add an item, if there is already another item in the same bucket (not necessarily the same hash), it is added to the end of the linked list.

So when you make one HashMap.get() it first uses the hash to go in the right element of the vector table, then it traverses the linked list using the equals to check if he really found the key you used.

Completion

Hashing is an excellent mechanism to classify content efficiently or even to check data integrity relatively safely (considering that the likelihood of collision is small), but nay to uniquely identify.

6

Note that there is no guarantee of no collision when using hash functions. This means that different strings can produce the same integer number (this is a collision). An example is the strings "[email protected]" and "[email protected]". Both produce the same integer in the hashcode implementation of the String class in Java.

So it all depends on the context for which it is being applied. If it is to solve a specific case, I would first try the approach described here, because it’s very simple to implement. Checking if there was a collision is also trivial to do. If it doesn’t meet, implementing others is simple, and even simpler, it’s using ready-made hash algorithms.

See the C++ implementation of a decent hash generating algorithm for Strings: (easily adapted to Java): https://stackoverflow.com/a/107657/2236741

Related answers in Soen:

https://stackoverflow.com/questions/2624192/good-hash-function-for-strings

https://stackoverflow.com/questions/6120657/how-to-generate-a-unique-hash-code-for-string-input-in-android

Browser other questions tagged

You are not signed in. Login or sign up in order to post.