What would be a good hash calculation algorithm to use in a scattering table?

Asked

Viewed 356 times

14

In the question Why String hashcode() in Java uses 31 as multiplier? There is talk of the use of the number 31 as a multiplier. There has been a controversy about the motivation of this number. In fact it is not considered as good. What would be a good number? Why?

Further, what a calculation function hash should be considered good for use in a scattering table? I know it should generate few code collisions hash, I wanted to know what should be observed in her to be considered good. I have seen many people claiming that their function hash is very good, but are very different from each other, I wonder if this is subjective, but should not be, seems to me something well mathematical.

  • 1

    Considering that few collisions is a good thing, you need to ask if you need reproducibility or a guarantee of no collision. If the hash does not need reproducibility, do it in a function with random factor (which changes with each arbitrary event), you prevent a degeneration-based attack of the hash table.

  • But how to ensure no collision? Then it would no longer be a hash.

1 answer

1

Well you need to know what values you are working with first, my hash will get a word or a number?.

Let’s say your hash gets a word.

You need to turn this word into a number.

To do this you need to calculate the value of each character.

Some examples:

  • Calculate values using the ASCII table;
  • Calular values using a similar custom pre-table type an ASCII override;
  • Calculate values using Hex from UTF-8;
  • Calular values using Base64;

Some things you can do while returning the values of each carectere:

Imagine that in the lercharacter(char c) function returns an integer.

  • Common operations (Split, Multiply, Subtract or Sum by k)
  • Advanced operations like Potency, Integer Factorization.
  • Base conversion transaction ex: x10 for x16.
  • You can also raise the n and take a piece of the result.

Important to record the position of each character of a word.

Imagine the following case:

If you take the value of each character using for example ASCII and just add them.

  • el = 8 + 10 = 18
  • le = 10 + 8 = 18

| le | el | => will have the same result.

a solution can be :

  • Calc = (C1)2 * (C1 / C2) * k;
  • k = some number first as 27,31.. 7243 (in example use 31);
  • C1 = Character position 1;
  • C2 = Character position 2

  • le = (10)2 * (10 / 8) * k = 3875
  • el = (8)2 * (8 / 10) * k = 1587

Integer Values Generating Indices

Good at this point your string should return a result in the above example:

Imagine that we are working with the word el whose value is 1587, so now let’s send this value to our hash function.

We can use some methods:

  • Rest of Division -> (( el Bitwise AND 0x7FFFF) % Size));

  • Multplication -> ( (el * Random(1)) * Table size));

  • Exponential

  • Factorization of Integers

    Here you can change the base of Numbers once more maybe. XD


Collision Treatment

To treat collisions you can run a function that reserves a piece of the hash avoiding collision... example:

  • AB45Z
  • AB45Y

These two hash did not collide for a house, in this case the parsing of the algorithm is bad because the generated hash are very similar, but what if it had as reserve a piece of the hash imagine in the same example that we reserved AB4.

Before generating the hash then we will make a reading checking reserved spaces and if all were occupied we could for example double the size of the table that would give us another value and possibly another hash.

Java does something similar in the hash map but it doubles when all table boxes are filled.


Collision Treatment 2.

Another way to avoid collision is to use more parameters when generating the example hash:

  • Current date and time with nano seconds.
  • Some Data as user name + password.
  • Pi, E, or PHI * Random(1);

Remarks :

  • Random(1) = random number between 0 and 1;
  • 0x7FFFF = to binary 1111111111111111111
  • Bitwise AND = function to return a new value by two values.

Nice to take a look at:

  • Beginning of the pigeon house.
  • Congruence Linear.
  • Chinese theorem of the rest
  • Pseudorandomness

  • 1

    The last paragraph has potential, but only what was posted is a hint on the subject not an answer that actually answers what was asked. The rest is wind.

  • I rephrased my answer to be more concrete. I’m no expert in the field so I may have said something silly, but I hope I’ve helped.

  • 1

    I think it’s cool the effort, but it seemed like a compendium of sparse ideas, more like "solution attempts" (which end up getting worse than taking the hash and simply an ID) than actually an answer to the question. But I will reread it calmly and willingly afterwards to try to see the foundation of each thing, before arriving at a definitive conclusion. The part of turning into number really didn’t make sense to me, because everything "is number" at the end, or everything is byte, it’s the same. Representations are only for humans.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.