What would be a good hash calculation algorithm to use in a scattering table?

Question

What would be a good hash calculation algorithm to use in a scattering table?

Asked 6 years, 6 months ago

Viewed 356 times

14

In the question Why String hashcode() in Java uses 31 as multiplier? There is talk of the use of the number 31 as a multiplier. There has been a controversy about the motivation of this number. In fact it is not considered as good. What would be a good number? Why?

Further, what a calculation function hash should be considered good for use in a scattering table? I know it should generate few code collisions hash, I wanted to know what should be observed in her to be considered good. I have seen many people claiming that their function hash is very good, but are very different from each other, I wonder if this is subjective, but should not be, seems to me something well mathematical.

1

Considering that few collisions is a good thing, you need to ask if you need reproducibility or a guarantee of no collision. If the hash does not need reproducibility, do it in a function with random factor (which changes with each arbitrary event), you prevent a degeneration-based attack of the hash table.

– Jefferson Quesado

2019/01/16 at 11:42
But how to ensure no collision? Then it would no longer be a hash.

– Maniero

2019/01/16 at 20:07

1 answer

Browser other questions tagged algorithm hash hashcode

You are not signed in. Login or sign up in order to post.

by Lucas Palomo • **518** points · Answer 1 · 2019-01-18T04:14:36+00:00

Well you need to know what values you are working with first, my hash will get a word or a number?.

Let’s say your hash gets a word.

You need to turn this word into a number.

To do this you need to calculate the value of each character.

Some examples:

Calculate values using the ASCII table;
Calular values using a similar custom pre-table type an ASCII override;
Calculate values using Hex from UTF-8;
Calular values using Base64;

Some things you can do while returning the values of each carectere:

Imagine that in the lercharacter(char c) function returns an integer.

Common operations (Split, Multiply, Subtract or Sum by k)
Advanced operations like Potency, Integer Factorization.
Base conversion transaction ex: x₁₀ for x₁₆.
You can also raise the n and take a piece of the result.

Important to record the position of each character of a word.

Imagine the following case:

If you take the value of each character using for example ASCII and just add them.

el = 8 + 10 = 18
le = 10 + 8 = 18

| le | el | => will have the same result.

a solution can be :

Calc = (C1)² * (C1 / C2) * k;
k = some number first as 27,31.. 7243 (in example use 31);
C1 = Character position 1;
C2 = Character position 2

le = (10)² * (10 / 8) * k = 3875
el = (8)² * (8 / 10) * k = 1587

Integer Values Generating Indices

Good at this point your string should return a result in the above example:

Imagine that we are working with the word el whose value is 1587, so now let’s send this value to our hash function.

We can use some methods:

Rest of Division -> (( el Bitwise AND 0x7FFFF) % Size));
Multplication -> ( (el * Random(1)) * Table size));
Exponential
Factorization of Integers

Here you can change the base of Numbers once more maybe. XD

Collision Treatment

To treat collisions you can run a function that reserves a piece of the hash avoiding collision... example:

AB45Z
AB45Y

These two hash did not collide for a house, in this case the parsing of the algorithm is bad because the generated hash are very similar, but what if it had as reserve a piece of the hash imagine in the same example that we reserved AB4.

Before generating the hash then we will make a reading checking reserved spaces and if all were occupied we could for example double the size of the table that would give us another value and possibly another hash.

Java does something similar in the hash map but it doubles when all table boxes are filled.

Collision Treatment 2.

Another way to avoid collision is to use more parameters when generating the example hash:

Current date and time with nano seconds.
Some Data as user name + password.
Pi, E, or PHI * Random(1);

Remarks :

Random(1) = random number between 0 and 1;
0x7FFFF = to binary 1111111111111111111
Bitwise AND = function to return a new value by two values.

Nice to take a look at:

Beginning of the pigeon house.
Congruence Linear.
Chinese theorem of the rest
Pseudorandomness