Generating unique identifier using time()

Asked

Viewed 1,036 times

0

I created an algorithm to generate an identifier, which consists of a random value with 24 non-sequential characters. The goal is basically to capture the current time using time() and concatenate with musical notes (letters from 'a' to 'g'). haha

Algorithm:

<?php
    $reduce = substr(date('YmdHis', time()), 2);
    $arr_date = str_split($reduce);
    $arr1 = str_split("cdefgab");
    $uid = "";
    for($i = 0; $i<12; $i++){
        $uid .= $arr1[rand(0, 6)];
        $uid .= $arr_date[$i];
    }
    echo $uid;

See working here in ideone.

Upshot:

e1d6e1c2e1g1g0f8f5f4a2a0

I intend to use that figure as ID unique with private key in a user table in my Mysql database. Basically it is to create a ID single(non-sequential) in place of a ID single integer that auto increments (sequential) in a user table.

I wonder if using this algorithm there would be some future problem or if there is something that makes this my solution unfeasible?

  • 2

    Basically you’re wasting entropy. 24 bytes can store up to 192 bits of information. Its algorithm generates approximately 60 bits of information, which is little to the size of the string. For comparison purposes, even if you needed something readable by a human, if you used 16 bytes in base 64 to give the 24 characters, you would have at least 128 bits, which is absurdly much less chance of collision than your method. It’s okay that 60 bits is still a fairly extensive set of possibilities, but as you get more and more records, the chance of collision increases.

  • Actually, it’s even better. In 24 base characters 64 fit 18 bytes, not only 16. Which gives 136 bits, not 128 as I mentioned earlier..

  • @Bacco (rs gone chat) Actually, I didn’t understand what you said that "...is better yet". You’re suggesting using TO_BASE64 is that?

  • I marked it as obsolete, for moderation clean because it was getting too much information already solved in the post. I meant that if you use base 64, you can use up to 18 completely random bytes, that is, a reasonable variety of combinations. If you do a sequence of 24 characters with Rand(0, 63), being the 63 characters of base 64 (or the ones you choose in their place), it gives 136 bits of information. (2,23 * 10 43 )

1 answer

3


What you can say about the code presented is that every century the chances of collision will be greater because it is cutting the first two numbers for the year.

The function date() format the date and time (2016-12-11 08:54:20), resulting in that: 20161211085420

In sequence removes the first two numbers, thus: 161211085420

Here we have a small problem because when you arrive in the year 2116, 100 years from now, you will have more chances of collisions even randomly shuffling with letters from A to G (ABCDEFG), which incidentally, is little to ensure greater combinations. Sure, it’s ridiculous to worry about it, but since you asked where there would be some problem. Basically you should think about obvious chances of collision.

In your example, you generated this: e1d6e1c2e1g1g0f8f5f4a2a0 and it’s very easy to understand the sequence just by removing the letters.

You can argue that 100 years is a long time, no one here will be alive and probably the system you are creating will not be used, moreover it will be used for user registration and not to generate large amounts of Ids at the same time. But even under this justification is still creating something that does not mask the sequence very well. Anyone with no knowledge of the sequence can understand the sequence and that it is a date.

If we are to point out a "problem" and as the intention seems to be to mask the sequence, I believe it is relevant to put this on the agenda.

In this context, I will focus on masking the sequence. So the question I ask is, why not use the timestamp value itself?

And if you still want to "shuffle" you can just convert to letters.

I usually use this:

$id = microtime();
$a = explode(' ', $id);
$a[0] = str_replace('.', '', $a[0]);
$a[0] = base_convert($a[0], 10, 36);
$a[1] = base_convert($a[1], 10, 36);
echo $id.' -> '.$a[0].$a[1];

Returns something like this: 0.70350700 1481537805 -> 15vuy4oi2hvx

That first, 0.70350700 1481537805, is the value returned from the function microtime().

I remove the dot character (.), separate the two numbers and convert each with base_convert().

070350700 -> 15vuy4
1481537805 -> oi2hvx

In the end, the converted result is concatenated, resulting in this ID: 15vuy4oi2hvx

It is half the space that would occupy with what presented in the question: e1d6e1c2e1g1g0f8f5f4a2a0 and still a little more "efficient" in relation to collisions by concatenating the microtime to the timestamp beyond the fact of not worrying about the date, unless the environment where execute has a date set wrong for a value in the past whose dates already have generated codes. These are ridiculous possibilities, but it’s possible.

Comparing the running time,

Script of the question: 0.0000128746032714844
Example script above: 0.0000109672546386719

It’s not so relevant, so the biggest advantage is the final string that would take up half the space.

In general, there is no problem with what you have put forward in the question, given the conditions. But I thought it would be useful to demonstrate another logic that I believe is a little better for generating a more economical string.

  • He was aware that "...every century the chances of collision will be greater...", and in this way would exactly argue "...that 100 years is a long time...". In the sequence question, I can’t see any problem because this ID is not some kind of access code that would be broken in some way, but your format interested me. Well, you must have asked how I got on this question?! Taking a look at the JSON that the Trello offers, I realized that they use a 192bit sequence as their identifier and I was curious, so I started to question this.

  • In your solution suggestion, you will ALWAYS return a value of 96bits?

  • It is expected that yes because timestamp is always 10 digits and microtime, 9. Provided you use base 36.

  • I will do some more tests, some research and validate your answer that on top was very useful to me. Thank you.

  • Daniel, I did some tests to validate your idea. I arrived in a situation, but I don’t know if it will really bring me a problem. In your time algorithm returns me a value with 96bits, time returns me with 88bits; I realized that reached about 65% for 11 characters, and the rest with 12 characters. Something wrong is not right! eheh In my bank, I want to give preference that a column has a fixed size all filled. Type VARCHAR(12) which always contains exactly 12 characters.

  • Normally I do not define fixed size. I define how CHAR(20) but if you want fixed size, you cannot rely on the method presented here. You will have to consider something like MD5, SHA, etc. But only MD5 itself already occupies a 32-character "monster" space for this purpose. But after all, why bother to have a fixed size? And could show examples of these cases that found different size?

  • True, maybe I don’t have to worry about fixed size. But do you think I can use a TEXT column?! Just this to close the question.

  • TEXT column is too big. Waste of space. I apply char 20 to give a safe margin. Note that VARCHAR does not use that allocates more space. I apply the simple CHAR. (Talking about Mysql)

  • My case would be Mysql using Sync with Sqlite.

Show 4 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.