Why does String hashcode() in Java use 31 as a multiplier?

Asked

Viewed 339 times

7

In Java, the code hash for an object String is computed as

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

using integer arithmetic, where s[i] is the i-nth character of the string, n is the string length, and ^ indicates exponentiation.

Why 31 is used as multiplier?

I understand that the multiplier should be a relatively large prime number. So why not 29, or 37, or even 97?

Source

  • A few years ago, I have seen criticism regarding the choice of this number, which claim to have been a bad choice. I will try to look for the link.

  • @Victorstafusa Included the question link in Soen. Indeed a another question linked there says that Java used for unnecessary performance reasons today and suggests larger numbers for future implementations of hashCode().

  • https://dzone.com/articles/what-is-wrong-with-hashcode-in-javalangstring

  • There are criticisms of the article in the respective comments. You have to digest their content well to conclude something. Criticize the claim that using strings as keys in Hashmaps leaves the application subject to Dos attacks.

2 answers

5


Usually the code hash is used with key for scattering tables, the so-called dictionaries. It is common for the maximum possible code value to be stored in 32 bits, so it makes sense to use the maximum multiple of 32 and the immediate lower prime is 31. Not that you need to use all the codes, but from that number you can derive the most appropriate index according to the amount of Buckets possible in that specific scattering thus giving a good distribution.

According to comments, today it is considered that there are better numbers (larger), but as far as I know the reason for the initial choice was this. A smaller number could generate code collisions much more easily. A bigger one is really better, but the gain difference is not so big, already a smaller one is much worse.

On some platforms an operation of shift of certain numbers is cheap, in others is not, in some cases there is optimization for some numbers, as is the case of the 31 that can be used (is a shift and a simple subtraction).

You could say it’s not a well-thought-out number, there was no deep evaluation, something that has a sensational justification :)

A comparison was made on Soen. It seems that certain numbers give in the same, but note that other observations need to be made, one cannot take the analysis in isolation. Ali shows no other problems of each number.

  • 1

    I don’t think it has anything to do with the number of bits, I believe this is just a coincidence. I’ll see if I can find any link that explains this.

  • I remember the question of primality, but I don’t remember the relationship with the whole

  • @Victorstafusa sees there, I answered the motivation that I know, until I understand that it is not a good number, but for what I know it was naively thought because of this, but I really do not have a reliable source about it.

0

Joshua Bloch, One of the developers of the Java platform, gave a good explanation in his book "Effective Java" (2nd Edition):

The value 31 was chosen because it is an odd prime. If it were even and multiplication resulted in overflow, information would be lost, since multiplication by 2 is equivalent to shifting. The advantage of using a cousin is less clear, but it’s traditional. A good property of 31 is that multiplication can be replaced by a shift and a subtraction to get better performance:

 31 * i = (i << 5) - i 

Modern VM’s do this kind of optimization automatically.

The book is from 2008 and since then may have come up with new ideas on how to make better hashes, but at least this is a response from one of the authors of the platform.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.