What you can say about the code presented is that every century the chances of collision will be greater because it is cutting the first two numbers for the year.
The function date()
format the date and time (2016-12-11 08:54:20), resulting in that: 20161211085420
In sequence removes the first two numbers, thus: 161211085420
Here we have a small problem because when you arrive in the year 2116, 100 years from now, you will have more chances of collisions even randomly shuffling with letters from A to G (ABCDEFG), which incidentally, is little to ensure greater combinations. Sure, it’s ridiculous to worry about it, but since you asked where there would be some problem. Basically you should think about obvious chances of collision.
In your example, you generated this: e1d6e1c2e1g1g0f8f5f4a2a0
and it’s very easy to understand the sequence just by removing the letters.
You can argue that 100 years is a long time, no one here will be alive and probably the system you are creating will not be used, moreover it will be used for user registration and not to generate large amounts of Ids at the same time. But even under this justification is still creating something that does not mask the sequence very well. Anyone with no knowledge of the sequence can understand the sequence and that it is a date.
If we are to point out a "problem" and as the intention seems to be to mask the sequence, I believe it is relevant to put this on the agenda.
In this context, I will focus on masking the sequence. So the question I ask is, why not use the timestamp value itself?
And if you still want to "shuffle" you can just convert to letters.
I usually use this:
$id = microtime();
$a = explode(' ', $id);
$a[0] = str_replace('.', '', $a[0]);
$a[0] = base_convert($a[0], 10, 36);
$a[1] = base_convert($a[1], 10, 36);
echo $id.' -> '.$a[0].$a[1];
Returns something like this: 0.70350700 1481537805 -> 15vuy4oi2hvx
That first, 0.70350700 1481537805
, is the value returned from the function microtime()
.
I remove the dot character (.), separate the two numbers and convert each with base_convert().
070350700 -> 15vuy4
1481537805 -> oi2hvx
In the end, the converted result is concatenated, resulting in this ID: 15vuy4oi2hvx
It is half the space that would occupy with what presented in the question: e1d6e1c2e1g1g0f8f5f4a2a0
and still a little more "efficient" in relation to collisions by concatenating the microtime to the timestamp beyond the fact of not worrying about the date, unless the environment where execute has a date set wrong for a value in the past whose dates already have generated codes. These are ridiculous possibilities, but it’s possible.
Comparing the running time,
Script of the question: 0.0000128746032714844
Example script above: 0.0000109672546386719
It’s not so relevant, so the biggest advantage is the final string that would take up half the space.
In general, there is no problem with what you have put forward in the question, given the conditions. But I thought it would be useful to demonstrate another logic that I believe is a little better for generating a more economical string.
Basically you’re wasting entropy. 24 bytes can store up to 192 bits of information. Its algorithm generates approximately 60 bits of information, which is little to the size of the string. For comparison purposes, even if you needed something readable by a human, if you used 16 bytes in base 64 to give the 24 characters, you would have at least 128 bits, which is absurdly much less chance of collision than your method. It’s okay that 60 bits is still a fairly extensive set of possibilities, but as you get more and more records, the chance of collision increases.
– Bacco
Actually, it’s even better. In 24 base characters 64 fit 18 bytes, not only 16. Which gives 136 bits, not 128 as I mentioned earlier..
– Bacco
@Bacco (rs gone chat) Actually, I didn’t understand what you said that "...is better yet". You’re suggesting using TO_BASE64 is that?
– viana
I marked it as obsolete, for moderation clean because it was getting too much information already solved in the post. I meant that if you use base 64, you can use up to 18 completely random bytes, that is, a reasonable variety of combinations. If you do a sequence of 24 characters with Rand(0, 63), being the 63 characters of base 64 (or the ones you choose in their place), it gives 136 bits of information. (2,23 * 10 43 )
– Bacco