As defined in RFC1321 the MD5 (Message-Digest Algorithm 5) is a message summary algorithm. It receives as input a message of an arbitrary length and produces as output a 128-bit "fingerprint".
Description of algorithm MD5
We start by assuming we have a message from b
bits as input and we want to get your summary. In this case b
is a non-negative arbitrary integer; b
can be zero, it doesn’t need to be a multiple of eight, and it can be arbitrarily large. Imagine the bits of this message written as follows:
m0 m1 ... mb-1
The following five Steps are performed to Compute the message Digest
of the message.
First Step: Add Padding Bits
The message is extended so that its length (in bits) is consonant to 448 modulo 512. That is, the message is extended until only 64 bits are missing so that its length is multiple of 512. The padding operation at all times is performed, even if the message length is already consistent with 448 (mod. 512).
Padding occurs as follows:
A bit 1
is added to the end of the message;
Bits 0
are added until the length (in bits) of the message after the padding is congruent to 448 (mod. 512).
That is to say: Altogether at least one bit will be added and not more than 512 bits will be.
Second Step: Attaching the Length
A 64-bit representation of b
(the length of the message before adding the padding bits) is attached to the result of the previous step. In the (unlikely) case of b
be greater than 264 then only the 64 least significant bits of b
are used. (These bits are attached as two 32-bit words and attached starting at least significantly (little-endian)).
At this point the resulting message has a length that is a 512-bit multiple. That is, the message length is a multiple of 16 words (32-bit). Let M[0 ... N-1]
the words of the resulting message, where N
is a multiple of 16.
Step 3: Initialize Buffer MD
A buffer four-word (A
,B
,C
,D
) is used to compute the summary of the message. In this case each of A
, B
, C
, D
is a 32-bit register. These registers are initialized with the values that follow in hexadecimal. The least significant bytes first:
word A: 01 23 45 67
word B: 89 ab cd Ef
word C: fe dc ba 98
word D: 76 54 32 10
Step Four: Process the 16-Word Block Message
First it is necessary to define four auxiliary functions that receive 3 32-bit words as input and produce a 32-bit word as output.
F(X,Y,Z) = XY v not(X) Z
G(X,Y,Z) = XZ v Y not(Z)
H(X,Y,Z) = X xor Y xor Z
I(X,Y,Z) = Y xor (X v not(Z))
Where
not(X)
is the complement bit-Wise of X
,
X v Y
is the OR bit-Wise of X
and Y
,
X xor Y
is the OR EXCLUSIVE bit-Wise of X
and Y
,
XY
is the And bit-Wise of X
and Y
.
This step uses a 64-element table T[1 ... 64]
constructed from the function seno
. Be it T[i]
equal to the ith element of the table, which is equal to the entire part of 4294967296 times abs(sin(i)), where i is in radians.
Consider that +
denotes the addition of words) and X <<< s
the value obtained by the circular displacement (rotation) of X
the left in s
bit positions.
Do the following:
/* Processa cada bloco de 16 palavras. */
For i = 0 to N/16-1 do
/* Copia o bloco i para X. */
For j = 0 to 15 do
Set X[j] to M[i*16+j].
end /* fim do loop em j */
/* Salva A como AA, B como BB, C como CC e D como DD. */
AA = A
BB = B
CC = C
DD = D
/* Round 1. */
/* Seja [abcd k s i] a operação
a = b + ((a + F(b,c,d) + X[k] + T[i]) <<< s). */
/* Faça as 16 operações que seguem. */
[ABCD 0 7 1] [DABC 1 12 2] [CDAB 2 17 3] [BCDA 3 22 4]
[ABCD 4 7 5] [DABC 5 12 6] [CDAB 6 17 7] [BCDA 7 22 8]
[ABCD 8 7 9] [DABC 9 12 10] [CDAB 10 17 11] [BCDA 11 22 12]
[ABCD 12 7 13] [DABC 13 12 14] [CDAB 14 17 15] [BCDA 15 22 16]
/* Round 2. */
/* Seja [abcd k s i] a operação
a = b + ((a + G(b,c,d) + X[k] + T[i]) <<< s). */
/* Faça as 16 operações que seguem. */
[ABCD 1 5 17] [DABC 6 9 18] [CDAB 11 14 19] [BCDA 0 20 20]
[ABCD 5 5 21] [DABC 10 9 22] [CDAB 15 14 23] [BCDA 4 20 24]
[ABCD 9 5 25] [DABC 14 9 26] [CDAB 3 14 27] [BCDA 8 20 28]
[ABCD 13 5 29] [DABC 2 9 30] [CDAB 7 14 31] [BCDA 12 20 32]
/* Round 3. */
/* Seja [abcd k s t] a operação
a = b + ((a + H(b,c,d) + X[k] + T[i]) <<< s). */
/* Faça as 16 operações que seguem. */
[ABCD 5 4 33] [DABC 8 11 34] [CDAB 11 16 35] [BCDA 14 23 36]
[ABCD 1 4 37] [DABC 4 11 38] [CDAB 7 16 39] [BCDA 10 23 40]
[ABCD 13 4 41] [DABC 0 11 42] [CDAB 3 16 43] [BCDA 6 23 44]
[ABCD 9 4 45] [DABC 12 11 46] [CDAB 15 16 47] [BCDA 2 23 48]
/* Round 4. */
/* Seja [abcd k s t] a operação
a = b + ((a + I(b,c,d) + X[k] + T[i]) <<< s). */
/* Faça as 16 operações que seguem. */
[ABCD 0 6 49] [DABC 7 10 50] [CDAB 14 15 51] [BCDA 5 21 52]
[ABCD 12 6 53] [DABC 3 10 54] [CDAB 10 15 55] [BCDA 1 21 56]
[ABCD 8 6 57] [DABC 15 10 58] [CDAB 6 15 59] [BCDA 13 21 60]
[ABCD 4 6 61] [DABC 11 10 62] [CDAB 2 15 63] [BCDA 9 21 64]
/* Então faça as seguintes adições.(Adicione a cada um dos quatro registradores o valor que eles tinham antes desse bloco começar.)*. */
A = A + AA
B = B + BB
C = C + CC
D = D + DD
end /* fim do loop em i */
This step can be implemented in a more didactic way as found in Wikipedia article on MD5:
//Processar a mensagem em pedaços sucessivos de 512-bits:
for each 512-bit chunk of message
break chunk into sixteen 32-bit little-endian words w(i), 0 ≤ i ≤ 15
//Inicializar o valor do hash para este pedaço
//Loop principal:
for i from 0 to 63
if 0 ≤ i ≤ 15 then
f := (b and c) or ((not b) and d)
g := i
//há uma palavra referenciada por w(i)
else if 16 ≤ i ≤ 31
f := (d and b) or ((not d) and c)
g := (5×i + 1) mod 16
//não há uma palavra referenciada por w(i), logo fazemos 5i+1 %16
else if 32 ≤ i ≤ 47
f := b xor c xor d
g := (3×i + 5) mod 16
//não há uma palavra referenciada por w(i), logo fazemos 3i+5 %16
else if 48 ≤ i ≤ 63
f := c xor (b or (not d))
g := (7×i) mod 16
//não há uma palavra referenciada por w(i), logo fazemos 7i %16
temp := d
d := c
c := b
b := ((a + f + k[i] + w(g)) leftrotate r[i]) + b
a := temp
//Adicionar este pedaço do hash ao resultado:
h0 := h0 + a
h1 := h1 + b
h2 := h2 + c
h3 := h3 + d
I consider the implementation more didactic because each block of each of the 16 words in the block is processed using a loop (in how much the RFC lists the 16 operations). In addition, it does not use auxiliary functions and makes it clear what will occur at each step. We can also notice the scattering that occurs within the four ranges of values of i
.
Fifth Step: Exit
The summary of the message produced as output is A
, B
, C
, D
, that is, starting with the least significant byte of A
, and ending with the most significant byte of D
.
A reference implementation in Javascript can be seen here.
He is considered safe?
From a look at collisions and Rainbow Tables.
Using MD5 for passwords
Using MD5 for passwords is not a very good idea because the function is very quick, I mean, it’s very easy for an attacker to try billions of possible passwords per second. In addition there are several vulnerabilities (and famous cases in which they were exploited) that are known in MD5.
Without wanting to generalize: nay use MD5 for passwords (not even with salt).
For more information on how to hash passwords securely see that one Question from Information Security.
Using MD5 to check file integrity
Based in that response from Information Security:
Depends on the scenario. Attacks against MD5 are collisional, meaning an attacker can make two files with the same hash if he has control of both files (have fun!), but it cannot create a file that has a hash equal to a file that was not influenced by it.
Note: this answer is largely a translation of RFC1321.
Whether it is safe or not depends on the application. It is vulnerable to collisions (both via birthday attack how much via common prefixes) - making its use impossible in several scenarios - and it is quite fast - making it impossible to use it for protect passwords. However, it is still resistant to pre-image (1st and 2nd), so there are still practical applications for it. Its use alone is limited, but combined with other algorithms (e.g., HMAC-MD5) it still has utility.
– mgibsonbr
Your comment is quite pertinent, but for now I will keep the question this way to cause this kind of discussion in the answers. If you think it can be edited or reworked, feel free to do so. :)
– Avelino
Okay, I only put it as a comment because I don’t have enough knowledge to give an answer, even if partial. If the community thinks it’s too wide, just take it off, but for now I have nothing against keeping it as it is.
– mgibsonbr
Instead of asking about each type of hash, you could ask about them all in one question.
– Antony Alkmim
Related: http://answall.com/questions/2402/como_doir-hash-de-passwordsformsafety?lq=1
– bfavaretto
@Antonyalkmim I had done this before, as shown in my editions made here: http://answall.com/posts/43493/revisions What happens is that I am trying to understand the "magic" that happens underneath the cloths, And because the two algorithms have different implementations, I came to the conclusion, along with mgibsonbr, that it would be better to divide them into two different questions. I believe that the two questions are also valid for future references and to aggregate content with search engines. But if the community disagrees, I can delete and reverse the issue.
– Avelino
I do not understand the reason for the negative votes. Could you give me an answer or even a comment so I can improve the question? If the question falls outside the scope of SOPT, could you give me the reasons? I have already consulted printed content sources and still can not understand the algorithm with the details I quoted in my question. This is a complex algorithm, and I need to understand it in as much detail as possible.
– Avelino
@Avelino I think the algorithm itself is very complex, but it is understandable to those who know the principles of cryptography. Could it be an explanation of what an MD5 hash is and the theory behind it? I do not know if it makes much sense to explain step by step the algorithm itself, since this would be nothing less than to read the thesis of the inventor of this algorithm.
– rodrigogq
@rodrigogq I think the explanation of what an MD5 hash is, and its utilities, is not very necessary. But an answer containing MD5 hash theory would be very welcome. However, for record purposes, you can cite the utilities of MD5.
– Avelino