What is the probability of generating a Repeated Guid?

Asked

Viewed 4,284 times

16

What is the probability of generating a Repeated Guid with Guid.NewGuid()?

I am uploading numerous images from my system.

The same will be a Multitenancy, and will share the same Deploy, from the same folders.

I assign the name of the pictures with a Guid + extension(jpg, png or gif)

Is there any chance of generating a Guid existing?

  • 3

    rod, isn’t it better to do sequential? or use seconds?

  • @Joannis if the name is formed by the seconds the chance of collision is real. Two images created in the same second can collide the name.

  • When I do this type of Upload, I use Datetime, or it is passed 1 thousandth would be impossible an equal number. Example 14/11/2014 16:28:31, leaving only numbers, 14112014162831, as I said, impossible to repeat this number.

  • @Diegozanardo In 1 thousandth of a second a 2 Ghz computer makes 2 Ghz / 1000 = 2 million clock cycles. There is too much cycle there to be no risk, I do not know the exact calculations, and I/O involved, besides the eventual occupation of this time by other processes of the operating system, but I would not be so categorical in saying that there is no risk of repetition.

3 answers

22


Although each generated GUID has no guarantee of being unique, the total number of unique keys (2128 or ~3.4 1038) is so great that the probability of the same number being generated twice is very small. For example, whereas the Observable Universe contains 5x1022 stars, each star could have ~6.8 1015 of their own Guids.

Source: Wikipedia

According to that answer in OS there is a 1% chance of collision if generating 2.600.000.000.000.000 Guids.

And in that other seems to show how to create a collision. It’s quite difficult. It takes absurdly.

Articles relevant from Raymond Chen. And there are links to another excellent Eric Lippert series.

Some people may not understand that GUID is made up of so much different information that it makes such an effort to be unique that most of the other attempts that one thinks of will be worse, including those quoted on this page. I am not saying that GUID is the best solution to the real problem of AP, just reinforcing the idea that it is more unique than the cold number indicates.

If Microsoft thinks it’s good for it to use GUID in many of its applications to ensure uniqueness, who am I to say it’s not good for?

I talk more about this in that reply.

  • thanks for answering, can you tell me if it would take too long if I did something like: while (System.IO.File.Exists(path)) to check in many files?

  • 1

    @Rod This is an O(n) algorithm. How long it takes, however, depends on the speed (and demands) of the disk, and how many files you already have. The ideal would be to store these images in a database and give a numerical identity to each one.

7

Although the generation of GUID’s is pseudo-random, I think we can consider it completely random for a doily calculation.

The variable characters are 32, each of which has 16 possible values.

This is more or less (24)25. I completely forgot the high school formulas, so I played (Math.pow(16, 32)) on the browser console and gave something close to 3.4 x (1038).

I’m going to write a long-form approach to that. We have (relatively) just over 3,400,000,000,000,000,000,000,000,000,000,000 possible combinations.

If you generate two equal GUID’s, take advantage and play a few times in the mega-sena (36,045,979,200 possible combinations, just for comparison purposes).

  • 1

    Curious is that we are used to using int (Int32) as PK and rarely worry about the limitation absurdly lower than the GUID. An Int32 has 2,147,483,647 values: ((2 32) / 2) - 1. We divide by 2 because we use only the positive ones. Minus 1 because we don’t use zero. When we talk about GUID (2 128) we automatically think about the possibility of collision.

2

As commented on the chat, and complemented the response of Maniero and Renan.

Using a GUID can be difficult to generate a collision, but using a number of this size, 38 houses, can bring problems such as space, memory, search, among others, of course depending on the application these problems do not exist.

My solution, using a library that generates pseudorandom numbers, takes out a module of 5 decimal places, is enough not to become a predictable number, making it concatenates to the right of a sequential number, so your possible problems described above will be more difficult to happen, besides you have the guarantee that will never be repeated.

Example

0001 numero sequencial
58976 numero gerado aleatoriamente

Your file 000158976.jpeg even though the first part is foreseeable the second part is not.

Look at my example working

http://ideone.com/BuLnud

string g;
    g = Guid.NewGuid().ToString();
    string nome = g.Substring(0, 5);
    nome = x.ToString()+nome;
    Console.WriteLine(nome);

Browser other questions tagged

You are not signed in. Login or sign up in order to post.