Answering the question of why the number of attempts needed to decrypt is half the number of possibilities, suppose you assemble a giant deck, where on one side of each card is written a possible decryption key, and on the other is written "yes" or "No", depending on whether the key is the right key or not.
If you have N cards and you know that the opponent chose the key randomly enough, the winning card will be with equal probability in each of the N positions (i.e. 1/N). If you number cards from the top to the bottom of the deck from 1 to N, the number of attempts you need to make to find the right key is the number of that card: if the key is the 17th card in the deck, you will make 17 attempts to find the right key.
Only that we don’t know what is the right card; we want to know what is the number medium of attempts that we need to make to find the right card. With probability 1/N this number is 1 (when the right key is the first card in the deck); with probability 1/N this number is 2 (when the right key is the second card in the deck); ... ; with probability 1/N this number is N (when the right key is the last card in the deck).
So the average number of attempts you need to make is
1/N * 1 + 1/N * 2 + 1/N * 3 + … + 1/N * (N-1) + 1/N * N =
= 1/N (1 + 2 + 3 + … + (N-1) + N) =
= 1/N * N * (N+1) / 2 =
= (N+1) / 2
For these algorithms that we are talking about, N is huge; this +1 is a billion dollar bill. So it is a very good approximation to state that the average number of attempts is N/2.
The whole discussion above assumes that the password is, in fact, random, but the fact is that in practice things don’t work that way - people choose birthday dates, phone numbers, ... as passwords.
To help protect users in these cases, it is popular for you to use a key stretching Function, as bcrypt, scrypt or PBKDF2: these functions are complicated hash functions (like every hash function) and guys (unlike MD5 and SHA-*).
The idea is that you generate a random string s
and uses f(s, x)
to encrypt
the file (where x
is the password the user will decorate); you keep s
open, along with the encrypted file. The idea is that if you know e.g. that the user chose x
as an eight-digit number, if you encrypt the file in a naive way, your opponent only needs to do, on average, 5 * 10 7 decryptions to open your file.
When you have a key stretching Function, you place your opponent against the wall: calculate f(s, x)
is very expensive; it keeps opening its file after 5 * 10 7 attempts, but each one of these attempts is much more expensive (in general this cost is adjustable, but for "civil" applications in general the crowd adjusts to f(s, x)
take a second to calculate on a modern computer).
To another option is you ignore the existence of f
and try to find the encryption key directly, but this is unviable if you are using an algorithm with a 128-bit key, for example.
26! - 4 x 10^26
should be26! = 4 x 10^26
, nay?– rodorgas
I have something to add: these "decryption time" columns don’t make the slightest sense out of context - a faster computer (or a cluster) would take less time than the average, a parallelizable algorithm (diagamos, AES-CTR) It would take less time than a non-parallelizable (AES-CBC type), an asymmetric encryption (say RSA or Diffie-Hellman in Zp) would take less time than a symmetric for the same key size, etc. I know this is not part of the question, but I thought it was worth mentioning. In the context of the book there must be a justification for these numbers, however.
– mgibsonbr