Can the MD5 hash repeat for different passwords causing collision?

Asked

Viewed 3,146 times

7

A co-worker made a System in Java that encounters collisions in a series of MD5 hash passwords. But he did not stop to analyze the results, he made only a statement that they exist. Only that I would like to prove this, someone happens to know the existence of two different passwords that could have the same MD5 hash?

The Wikipedia talks more about collisions and vulnerabilities.

To clarify the problem, something that seems simple to understand, but that seems not to have been very clear, what I want is the two distinct values that caused the same hash MD5, example, typing in the Workbench:

set @valor1 := 'senha_distinta1????';
set @valor2 := 'senha_distinta2????';
SELECT MD5(@valor1) AS hashA, MD5(@valor2) AS hashB;

where: Hasha = hashB

Obs: I tried to put the following input values to test the suggested examples and both returned at different hash values:

set @a:= "d131dd02c5e6eec4693d9a0698aff95c 2fcab58712467eab4004583eb8fb7f89
55ad340609f4b30283e488832571415a 085125e8f7cdc99fd91dbdf280373c5b
d8823e3156348f5bae6dacd436c919c6 dd53e2b487da03fd02396306d248cda0
e99f33420f577ee8ce54b67080a80d1e c69821bcb6a8839396f9652b6ff72a70";

set @b:= "d131dd02c5e6eec4693d9a0698aff95c 2fcab50712467eab4004583eb8fb7f89
55ad340609f4b30283e4888325f1415a 085125e8f7cdc99fd91dbd7280373c5b
d8823e3156348f5bae6dacd436c919c6 dd53e23487da03fd02396306d248cda0
e99f33420f577ee8ce54b67080280d1e c69821bcb6a8839396f965ab6ff72a70";

set @a:=Replace(@a," ","");
set @a:=Replace(@a,"
","");
set @b:=Replace(@b,"
","");
set @b:=Replace(@b," ","");

SELECT  MD5(@a) as hashA, MD5(@b) as hashB, @a as valorA, @b as valorB;
  • 2

    Possible duplicate of How the MD5 hash algorithm works?

  • @Sanction, you are roundly wrong, the question is not a duplicate and also does not answer my question, it just informs the functioning of the algorithm, what I need is real examples that there are collisions. So far I’ve only seen claims about it, but nothing is conclusive.

  • 3

    Look I think the question does not answer this directly, but states and provide the links on the subject, it is not really a duplicate, but I tell you one thing, the sha1 that also presented the same problem later, now being recommended the sha-2, that "maybe" She may have the same problem, but I don’t know how often. I voted because I left it open because although the other question is practically the same as your answer there speaks little about the security problem. Let’s see what the others think.

  • Actually after reading some links I noticed that the issue of passwords and files with md5 may vary, taking into account this your question is quite different really from the other. It should be kept open anyway, I just think the title was not good (but it is my opinion). If no one responds later I try to formulate a response. See you later.

  • Why did the salt ?

  • @Edilson O salt exists to avoid Rainbow Tables (break multiple passwords for the "price" of one), has nothing to do with collisions.

  • 2

    @mgibsonbr, really, when quoting salt I have become oblivious to conversation, when the problem at stake is another.

  • I don’t understand why I was denied, but okay. I believe that this is a way for people to show that they could not understand the question, therefore negative. It’s normal not to understand something you don’t know.

  • 2

    The answer to the question is simple. YES. Mathematically it is proven that qq hash has collisions. Now having passwords with the same output hash is very difficult. More common than a long bit string generates a collision for a short string.

  • 2

    in your example using Workbench, try to make an UNHEX of your hexdecimal string: SELECT MD5(UNHEX(@a)) as hashA, MD5(UNHEX(@b)) as hashB, UNHEX(@a) as valorA, UNHEX(@b) as valorB;

  • @Tobymosque That! That’s what I was trying to explain to AP but I wasn’t able to make myself understand (by the way, I updated my answer, I hope this point is a little more clear).

  • It worked @Tobymosque, I believe your answer cleared my question, thank you. I would put that as a response.

  • My comment is not an answer, I just pointed out that your code adapted from the example given by @mgibsonbr was not working as expected. In my view the answer of Gibson is complete and I do not need to add anything the same.

Show 8 more comments

1 answer

13


There are not only examples but also a website that creates collisions for you:

the two blocks

d131dd02c5e6eec4693d9a0698aff95c 2fcab58712467eab4004583eb8fb7f89
55ad340609f4b30283e488832571415a 085125e8f7cdc99fd91dbdf280373c5b
d8823e3156348f5bae6dacd436c919c6 dd53e2b487da03fd02396306d248cda0
e99f33420f577ee8ce54b67080a80d1e c69821bcb6a8839396f9652b6ff72a70

and

d131dd02c5e6eec4693d9a0698aff95c 2fcab50712467eab4004583eb8fb7f89
55ad340609f4b30283e4888325f1415a 085125e8f7cdc99fd91dbd7280373c5b
d8823e3156348f5bae6dacd436c919c6 dd53e23487da03fd02396306d248cda0
e99f33420f577ee8ce54b67080280d1e c69821bcb6a8839396f965ab6ff72a70

produce an MD5 collision.

Each of these blocks has the MD5 hash 79054025255fb1a26e4bc422aef54eb4.

Source

Example in the ideone

If you refer to two strings short with the same MD5, I find it quite unlikely to exist. MD5 output is 128 bits. This means that if you generate 2128 + 1 distinct strings then guaranteed you will have a collision (beginning of pigeon house). But the set of Unicode strings in BMP of size up to 7, together, give less than 2128 - 1, so that it is possible that there is not a single collision.

Anyway, the problem of using MD5 to protect passwords is not its vulnerability to collisions (because for passwords what matters is the resistance to the second pre-image, and in this the MD5 has not yet been broken), but the fact that it is very quick (one good hash to protect passwords must necessarily be slow).


Updating: a point that seems to be causing confusion is that the input that an MD5 hash expects is a sequence binary, and not a string of characters. So that even if its input is a string, it would first have to be converted into a binary sequence (via an encoding or encoding) before being sent as input to the MD5 algorithm. Many libraries and functions do this automatically for you, so looks like that MD5 accepts strings, when actually it does not accept.

The examples cited are two binary sequences, expressed in hexadecimal. it is not a simple matter of using the above text as input to the hash, it is necessary to interpret it in the correct way:

0xd1 == 209 == 1101 0001

What is different from string "d1":

"d1" == 0x64 0x31 == 0110 0100 0011 0001

Note how the outputs are different. Done this way, the hashes will also be different...

If you want/need to put the above example in a string, the way is to use a hex code for each pair of characters. You didn’t say which platform you’re using, but an example in Python would be: (Edit: Sorry, I didn’t know what this Workbench was... see the comment of the Tobymosque for an example)

a = "\xd1\x31\xdd\x02\xc5...\x70\x80\xa8\x0d...\x70"
print(hashlib.md5(a).hexdigest())

b = "\xd1\x31\xdd\x02\xc5...\x70\x80\x28\x0d...\x70"
print(hashlib.md5(b).hexdigest())

Or, easier, it’s converting the above hexadecimal data to binary and testing straight into binary (as I did in example of ideone previously mentioned).

  • I didn’t understand where the password is that generated these blocks that were collided. And how did you get this hash: 790540255fb1a26e4bc422aef54eb4 ? And what would be the two different passwords generated with 2 to the 128 + any string that gave in a collision?

  • @Ivanferrer The blocks are the input, not the output. Note that they are not equal (the first link I passed shows highlighted the differences). The hash is the MD5 of one or another pair of blocks (binary, represented above as hexadecimal). They’re pretty big, but as I said, there’s no known collision with entrances short, typically what is used in a password (but it is not guaranteed that they do not exist). Testing this is complicated on an ordinary computer, see this post on Crypto.SE for more details.

  • did not understand, because the entry of the first hash converted to md5 was 'f52a2ca211111dc63ea4b50e5854bf0a';

  • What I said about 2 being high to 128 is that if you go out generating distinct strings (e.g..: a, b, c, ..., z, aa, ab, ac, ..., zz, aaa`, ...) and the number of strings is greater than 2 128 so you are guaranteed to encounter a collision as there are only 2 128 separate outputs for MD5.

  • @Ivanferrer How did you make the entry? I’m talking about a file binary, not just copy the text and make his MD5. This collision was discovered by two Chinese in 2005, and since then has been discovered a shorter collision, with 1 block only. But to reiterate: finding collisions is not easy, you can’t do it with an ordinary computer, it takes a lot of processing and/or a lot of time for it, so there’s no way I can give you better examples.

  • Yeah, here’s the thing.

  • We’re back to square one, there’s no evidence of a collision, it’s just assumptions.

  • @Ivanferrer Please read about the beginning of the pigeon house, there is no supposition, there are collisions and they are inevitable! I’ll put together an example in Python for you see the collision, if you don’t already believe. Now, if what you want is something else (e.g., strings with letters and numbers only, etc.) then please edit the question by clarifying that. Because I’m not at square one no, for me it’s all very clear! If you want me to continue helping you, then "help me help you" rsrs.

  • I read, and I re-edited the question to make it clearer, I understand that logically there must be a collision: as in this example: http://crypto.stackexchange.com/questions/1434/are-there-two-known-strings-which-have-the-same-md5-hash-value, but I tested all the options and nothing gave me a valid hash.

  • @Ivanferrer With "normal" strings (alphanumeric, Latin, Unicode...) I don’t know any example, I can only prove that it exists (as I see you have already understood). With binary data, however, the above example works (I also edited the answer by putting a practical example on ideone).

  • The example you gave in ideone is not a binary, it is exadecimal... I wanted a practical example in string+numbers+symbols+etc. I guess I’ll have to figure it out myself.

  • 1

    @mgibsonbr, Workbench is a BDSS management GUI, in this case the MySQL, other examples of tools for this purpose, are the SQL Server Management Studio and the pgAdmin.

  • Thank you! I would also put the complement of the solution presented by @Tobymosque, which is exactly what I needed to capture the value of the two inputs in which they were decoded from the exadecimal.

Show 8 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.