How is the Global Unique Identifier (GUID) generated?

Asked

Viewed 8,073 times

29

The Global Unique Identifier is generated so that no other will be generated equal, or will almost never have the same number.

 var unique = Guid.NewGuid().ToString();

Resultado: 440cdeee-5b8a-462a-96fd-20b24bd82f55

  • How is that possible?
  • Is there a mathematical formula behind it? If so, how does it work?
  • I wait for an answer that explains how this is possible when we are offline too.

  • 1

    @Marconciliosouza I believe it is a very well worked mathematical formula! Waiting for an answer here :)

  • 1

    Oxente, -1 ? Why ? was +7 now . 6 ?

  • @Magichat I noticed here too! You will know :)

  • 1

    @Marconi I think the criticisms strengthen, but when pointed the point, so in a valley of nothing...

  • 1

    @Magichat vdd, even to improve!

Show 1 more comment

3 answers

20


How is that possible?

Large numbers (2 to 128) have nearly infinite capacity.

There is no guarantee that the GUID will always be unique. There are even versions that guarantee, but you can use the version that does not guarantee. Even though it does not guarantee, possibility of collision is very small, negligible.

Is there any mathematical formula behind it, how it works?

According to Wikipedia the UUID format (the same as GUID used by Microsoft) is:

123e4567-e89b-12d3-a456-426655440000
xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx
  • 4-byte (8 digits in hexadecimal representation) least significant part of UUID generation time.

  • 2-byte (4 digits in hexadecimal representation) significant average time share of UUID generation.

  • 2-byte (4-digit hexadecimal representation) UUID version followed by most significant part of UUID generation time.

  • 2-byte separated into 2 fields (total 4 digits in hexadecimal representation) resolution sequence of the clock, high in the first field and low in the second, with multiplexing in the first bits to complement the version.

  • 6-bytes (12 digits in hexadecimal representation) node adopted.

Note that this is a visual representation for humans. You do not need to store or transmit these 36 characters. Just generate a number of 128 bits, so on the computer it only needs to occupy 16 bytes.

Versions

These data are filled in different ways depending on the version. The UUID implementation of each library can do as it suits, including varying according to usage. Do not need to fill in exactly as described above, just need to follow the generation standard according to the chosen version specification.

  • One of the most used versions is 1, where the fill in is the time and the MAC address the unity of that machine is guaranteed.

    There are some ways to ensure that two equal times are not generated if the machine’s clock does not have much resolution. The time has to be unique on the machine. And total uniqueness is to identify in which machine it was generated, assuming that clone MAC address is not done, since this is not a normal operation. I will omit the details.

  • Version 2 is not usually used because it involves undisclosed security protocol.

  • Versions 3 and 5 use a hashing of a namespace. Uniqueness is obtained based on a name that is already unique. This name can be a URL (when the inexperienced programmer with this sees a URL in a file thinks it is accessing that address, but it is only a namespace single) or one object identifier which has a logic of creating uniqueness. This form is advantageous because it can be repeatable, ie the same name generates the same UUID.

    Version 3 uses MD5 and version 5 uses SHA1, much better but "heavier".

    Example.

  • Version 4 uses a randomly generated number. This form can be used on any machine, even in the absence of a MAC address or a clock reasonable. It may have collisions, but by having a high resolution is improbable that occurs. Ideally it should be a truly random number, but a pseudorandom one is accepted.

    Example.

Obviously each version has specified how these data are distributed in those fields indicated above and the algorithm will be different according to the version, but basically it is to get the required numbers according to the specification. This is usually a simple call to the operating system API and mount in the specified format.

How is this possible when we are offline?

He was created to be offline same. It doesn’t have to be networked to work. It doesn’t search for it elsewhere.

Perhaps the confusion comes because it is used as a primary key in the database, I imagine there is the idea that the client asks the server for a unique global number, but in fact the client generates on its own. It even has the advantage of be able to generate the whole record without consulting the database. Usually when we use a ID sequential has to ask the server which is it after the insert (example).

Generates alone because it is used in several things that only concerns the same machine.

If you pay attention to Internals much of the operating system that you access with a cute name, this name is only descriptive, your identity name is a GUID. If using the name could have name collisions and a translation of the name would make the object another, the GUID is canonical.

  • This "train" as it says here in mines is more complex than I imagine. :)

  • 1

    @Marconi depends on the point of view, I think even too simple :)

  • The people from Microsoft that already leaves chewed for us, develop something of the type I believe that it would give work @bigown!

  • 1

    @Marconi put examples in Lua that the algorithm is actually very simple and small. You will see that it is more complicated to explain than to do :D

  • Very interesting. I had never researched about how they are generated.

10

1 - "I know that the Global Unique Identifier is generated in a way that neither other will be equal."

In the very link you put says: "Despite each generated GUID have no guarantee of being unique, the total number of single keys (2 128 or -3.4 10 38) is so large that the probability of the same number being generated twice is very small. For example, whereas the observable universe contains 5 10 22 stars, each star could have ~6.8 10 15 of its own Guids."

2 - "There is some mathematical formula behind, how it works?"

Yes, there is a pattern and some versions being:

In its canonical textual representation, the sixteen octets of a UUID are represented by 32 hexadecimal lowercase digits (base 16), displayed in five groups separated by hyphens, in the form 8-4-4-4-12 for a total of 36 characters (32 alphanumeric characters and four hyphens). For example:

123e4567-e89b-12d3-a456-426655440000

The 8-4-4-4-12 canonical format string is based on "registration layout" for the 16 bytes of UUID:

  • An integer of 4 bytes (8 hex digits) "time_low" giving the low 32 bits of time.

  • A 2 byte "time_mid" integer (4 hex digits) giving the half 16 bits of time.

  • A 2-byte (4-digit hexadecimal) "time_hi_and_version" field, with the 4-bit "version" in the most significant bits, followed by the highest 12-bits of time.

  • Two 1-byte fields (totaling 4 hexadecimal digits) called "clock_seq_hi_res" and "clock_seq_lo", with the "variant" multiplexed in the most significant 1-3 bits of clock_seq_hi_res.

  • Six bytes (12 hex digits) with 48 bit "node".

Source

Basic algorithm

The following algorithm is simple, correct and inefficient:

  • Get a global system lock.

  • From a stable shared system storage (for example, a file), read the status of the UUID generator: the timestamp values, clock sequence, and node ID used to generate the latest UUID.

  • Get the current time as a 60-bit count of 100-nanosecond intervals since 00:00:00.00, October 15, 1582.

  • Get the current node ID.

  • If the state is not available (e.g., nonexistent or corrupted), or the saved node ID is different from the current node ID, generate a random clock sequence value.

  • If status is available, but the saved timestamp is later than the current date/time stamp, increment the value of the clock sequence.

  • Save the state (current timestamp, clock sequence and node ID) back to the stored stable value.

  • Release the global lock.

  • Format a UUID from the current timestamp, clock sequence and ID node according to the steps of Section 4.2.2.

This is the basic pattern in the specific case of the question: "GUID" is a variant of UUID implemented by Microsoft using version 4 of the UUID.

Precisely because it is implemented in the operating system it is possible to be generated off-line as questioned in the comments.

Other information: RFC4122

  • Liked it here too @Magichat :)

  • 1

    Show... A good question...

9

There is an article describing how Guids are generated (https://blogs.msdn.microsoft.com/oldnewthing/20080627-00/? p=21823) and also why the substring of a Guid is not guaranteed to be unique.

Basically a GUID is generated using the combination of:

MAC Adrress of the machine used to generate the GUID - Guids generated on different machines are unique, unless the MAC is reused.

Timestamp - then the Guids generated at different times on the same machine are different.

Extra "emergency unifying bits" - are used to ensure that Guids generated at virtually the same time and on the same machine are unique.

Identifier for the algorithm - Guids generated by different algorithms are unique.

However, this is only a particular algorithm used to generate GUID.

Source: https://stackoverflow.com/questions/1888254/how-does-c-sharp-generate-guids

Browser other questions tagged

You are not signed in. Login or sign up in order to post.