The default PHP behavior is to use the hash md5
or sha1
of some values obtained at the time of ID generation:
- Client IP;
- Current time;
- Any random number (can be provided by an OS PRNG, such as
/dev/urandom
).
Is it possible to have a collision? Yes! As it is known, both MD5 and SHA1 are unsafe. But the goal is simply to make computationally expensive brute force attacks. So much so that in the case of PHP, there is no collision treatment.
In the case of implementing a data structure that aims to quickly search for values (such as Hash Table), this is not the best way. One should consider only the data itself in Hash. Otherwise, you will not be able to recover your data.
The implementation of collision treatment is mandatory in most cases, and it is up to you to decide which of the implementations to use. The collision treatment algorithm may or may not be suitable according to the nature of the operations to be carried out.
Possible Collision Handling in PHP
In the case of PHP, I believe that simply generating a new hash again is enough. This works because even though the client’s IP is still the same, the random number for sure (well... hopefully, right?) will change, and the time will probably change too. Then I see no reason to repeat the procedure until a collision does not occur.
I am not going to discuss the collision in the case of data because I believe it is not the domain of questioning.
Actually I have a numerical sequence based on TS along with 10 random numbers, but using md5 ends up creating the possibility of collision. That’s why I wanted to know
SESSION
. But thanks for the answer.– Papa Charlie
@Papacharlie Accidental MD5 collisions are very unlikely (while collisions intentional, i.e. malicious, are relatively simple to produce), since it is clear that the hashed data are distinct (this is the difficult part).
– mgibsonbr
@mgibsonbr Murphy’s Law... very unlikely means it will happen (rs). If there is collision I will not be able to get around the problem...
– Papa Charlie
A random UUID has 122 bits (<16 bytes, half of an MD5). If you generate a few tens of trillion of these in a year, the chance of a collision is roughly the same as if you were hit by a meteorite in that same period... I’m not going to guess anymore, because I don’t have enough knowledge to say anything safely, but in general I say that the distinct data hashes will also be distinct, even for a "broken" algorithm like MD5 (again, unless this collision was purposefully made).
– mgibsonbr