Theory
Hash password is always a secondary defense. A server that authenticates needs some information in order to validate a password.
A simple system stores the passwords themselves literally, and the validation in this case is mere string comparison. In this case, if someone just peeks into the database archive, they’ll see too much information.
This kind of gap happens in practice. A backup in the wrong place, a hard drive changed and not erased correctly, an SQL injection, and so on. See a detailed discussion on this blog.
Even so, as the content of a server that validates passwords necessarily includes the data for this validation, someone who has mere copy of these can make a dictionary attack offline,
trying potential passwords until some match, and this kind of attack is inevitable. So what we can do is try to make this attack as difficult as possible, and for that, we have these tools:
Cryptographic functions of hash: are mathematical functions that at the same time are efficient, no one knows how to reverse. The server can maintain a hash of a password;
when making the comparison, just use the same hash in the second value and see if they match; anyway, looking at the hash can not know which is the original password.
Salts: one of the advantages of the attacker is the parallelism. The attacker picks up a bunch of passwords protected with hash and wants to find out as much as possible of them. He can simply
make a hash of a potential password and compare that one hash with hundreds of different records. You can also use pre-calculated hash tables, including Rainbow Tables;
The characteristic of parallelism attacks is to act on multiple passwords with the very same hash function. Use the salt not having a hash function, but a portion of it. Ideally each password should use its own hash. A salt is a way to select a specific solution for hash between a large family of functions. If properly used, it can totally end the parallelism.
Slowness: computers are getting faster and faster, as theorized by Gordon Moore, co-founder of Intel. Human brains are not. Each year attackers can test more and more passwords at the same time, while users do not remember more complex passwords (or refuse to remember). For this, we can make hashes stay extremely slow using functions that require many iterations.
We have some very common cryptographic functions, like MD5 and the SHA family. Construct a function of hash using elementary operations is not an easy task.
When cryptographers want to do this, think deeply, and organize tournaments where the functions "fight each other violently".
After hundreds of them turning, stirring, poking a function for several years and finding nothing bad to talk about it, they begin to admit that, perhaps, that function can be considered more or less safe.
That’s what happened in SHA-3 competition. Us we have to use this means of building these functions because we don’t know any better.
Mathematically we don’t know if safe hash functions really exist, what we have are candidates (this is the difference between "can’t be broken" and "no one knows how to break").
A basic hash function, even safe as hash function is not suitable for passwords, by the following:
Then we need something better. What’s more, properly merging a hash function, salt and iterating, is no simpler than designing a hash function -- at least if you want a safe result.
Again you depend on standard constructions that survived the continuous massacre of cryptographers vindictive.
Functions of hash good for passwords
PBKDF2
The PBKDF2 comes from PKCS#5. It is parameterized with iteration counter that starts with 1 and has no upper limit, an arbitrary salt with no size restriction, and the required output size.
(PBKDF2 generates a configurable size output) and a PRF. In practice PBKDF2 is always used with HMAC, which is built in turn on a hash function as well. When we say "PBKDF2 with SHA-1", it actually means "PBKDF2 with HMAC with SHA-1".
Perks:
- It’s been defined for a long time, and it’s still "unharmed".
- Already present in several frameworks (as .NET, for example).
- Highly configurable (although some implementations do not allow the choice of hash, such as . NET which uses SHA-1 only).
- Approved by NIST (observe the difference between hashing and key derivation; read on below).
- Configurable output size (read below).
Disadvantages:
- It’s CPU-only, and much more GPU-accessible (the server is usually a normal PC, for usual tasks, while the attacker can invest more in specialized hardware and take advantage of it).
- You have to adjust the parameters on your own (generate and manage the salt, number of iterations etc). There is a Default encoding for PBKDF2 parameters but it uses ASN.1, so people avoid whenever they can (ASN.1 is complicated for non-intelleurs).
bcrypt
The bcrypt was developed reusing and expanding elements of a block Cipher called Blowfish.
The iteration count is a power of two, which is much less configurable than that of PBKDF2, but sufficient for normal use. This is the core of the hash of passwords of Openbsd.
Perks:
- Implementation available for multiple languages (see the links in the footer of the Wikipedia page)
- More "resistant" to GPU; thanks to details of its design internal. The authors of bcrypt did so voluntarily.
They reused Blowfish because it is based on an internal RAM table that is constantly modified during processing. This complicates the life of those who want to accelerate bcrypt with Gpus, because Gpus are not good for accessing memory in parallel.
- Standard output includes salt and iteration count, greatly simplifying storage, which is only one string.
Disadvantages:
- Fixed output size: 192 bits.
- Despite forcing Gpus, it can be optimized with FPGA: Modern FPGA chips have built-in RAM blocks that are convenient for running bcrypt in parallel on the same chip. It’s already been done, including.
- The input password is limited to 51 characters. For larger passwords, someone would have to combine bcrypt with a hash (calculates the hash password, and uses the result with bcrypt).
Combining cryptographic primitives has risks, so this is not recommended for general use.
scrypt
The scrypt is much newer (designed in 2009), based on PBKDF2 and a stream Cipher called Salsa20/8. Anyway, these are just tools around the core that focus the strength of the
scrypt, which is the RAM. The scrypt was made to use a lot of RAM (it generates a few pseudo-random bytes, and reads them repeatedly, in a sequence also pseudo-random).
"Too much RAM" is a hard thing to parallelize. A basic PC is good at accessing RAM, and will usually not try to read simultaneously from RAM dozens of bytes with no apparent relation.
A GPU or FPGA attacker may want to do this, but will have difficulty.
Perks:
- A PC, that is, what the defender will use to apply the hash normally, it is one of the best (if not the best) platform to operate scrypt.
The attacker no longer has an advantage by investing in GPU or FPGA.
- One more adjustment option: memory usage.
Disadvantages:
- It is still new (my "personal rule" is to wait at least 5 years of exposure -- but, of course, it is good that other people use the scrypt in production, to give greater visibility).
- Not so many implementations available for the various languages.
- It is unclear what CPU and RAM ideal to use. For each pseudo-random access the scrypt needs to compute a hash, one miss cache spends 200 cycles, a call of SHA-256 spends 1000. There is room for improvement in this aspect.
- One more option to adjust: how much memory to use.
Iterated And Salted S2K openpgp
I am mentioning this because you will use it if protecting files with password using Gnupg. This tool follows the openpgp format, defining its own functions of hash, calls "Simple S2K", "Salted S2K" and "Iterated and Salted S2K".
Only the third can be considered "good" in the context of this response. It is defined as a long string hash (configurable up to 65 megabytes) and consists of repeating an 8-byte salt and password.
Broadly speaking, Openpgp’s "Iterated And Salted S2K" is decent, similar to a PBKDF2 with fewer options. You will rarely find this implementation outside of Openpgp.
"crypt" of Unix
Current Unix-like systems (such as Linux) use iterated variants and with salt of function crypt(), which is based on good functions of hash, with thousands of iterations. That’s fairly good. Some systems can also use bcrypt, which is even better.
The "ancient" crypt() function, which was based on block Cipher DES, nay is good:
- It is slow in software, but fast in hardware, and can still be accelerated in software when computing multiple instances in parallel (technique known as SWAR or "bitslicing"), which is advantageous for the attacker.
- It’s still very fast with 25 iterations.
- Its salt is 12 bits, implying frequent reuse.
- Truncates the password to 8 characters and even truncates the high bit, transforming into pure ASCII (7 bits).
The new variants, which are currently in use, are OK.
Functions of hash bad for passwords
Everything else, especially homemade solutions that people insist on creating, assuming that "secure encryption" is "throwing away every cryptographic operation or not that can be thought of".
Behold this question as an example. The principle in it seems to be that the complexity with the mess of the instructions will confuse attackers.
In practice, the developer will always be more lost in his own creation than the attacker.
Complexity is bad. Homemade solution is bad. Novelty is bad. Remembering this you will avoid 99% of the problems of hash of passwords, and security in general.
Hash password in Windows used to be terribly ghastly, now it’s just bad (MD4 without salt and iterations).
Key derivation
So far we have considered the question of hash of passwords. An upcoming problem is the transformation of a password into a symmetric key that can be used for encryption. This is called key derivation and it’s the first thing you do when you encrypt a file with a password.
It is possible to make elaborate examples of hash that are safe to keep a token of validation, but which are very bad for generating symmetric keys; the opposite is possible in the same way. These examples are however artificial. For practical cases as described above:
- The exit of a hash password is acceptable as symmetric key, after being truncated in required size.
- A key derivation function can serve as hash password, provided that the derived key is long enough to avoid "generic pre-images" (the attacker is lucky enough to find a password that gives the same result). 100-bit output should be sufficient.
As a matter of fact, PBKDF2 and scrypt are key derivation functions, not hashing -- and NIST approves PBKDF2 as a key derivation function, not explicitly as Hasher.
(but with just a little bit of hypocrisy, you can read the NIST material in order to understand that PBKDF2 is good for hash password).
Still, the bcrypt is actually a block Cipher (the password processing part is the "key Schedule") which is then used in CTR mode to produce three 192-bit blocks of pseudo-random output, making it a kind of function of hash. bcrypt can turn a key derivation function with a light operation, using the block Cypher in CTR mode to generate more blocks.
But as usual, we don’t do home patches again. Fortunately 192 bits are more than enough for most purposes (e.g., symmetric encryption with GCM or EAX only need 128 bits in key).
Additional considerations
How many iterations?
The more the better! This slow-and race-Salted It’s a close dispute between attacker and defender.
You use many iterations to leave the hashing more difficult to all.
To increase security, you should keep the number higher that is tolerated on the server, considering the other things it should do. The louder the better.
Collisions and MD5
MD5 broken: it is very easy to get several pairs of different inputs with the same output value. These are the collisions.
Meanwhile, collisions are not a problem for hash password. Hash of passwords has to resist the pre-images, noncollisions.
Collisions are pairs that give same exit unrestricted, while in the hash of passwords the attacker has to find a message that gives a determined exit, which he did not choose.
This is quite different. As far as we know, the MD5 is about as strong as ever in relation to pre-images (there is a theoretical attack which is still far from being viable in practice).
The real problem with MD5 is that its use is very common for passwords, it is very fast and has no salt. But the PBKDF2 used with MD5 would be robust.
You should use SHA-1 or SHA-256, but because of "public relations". People get nervous when they hear "MD5".
Generation of salt
The fundamental objective of salt is to be the most single possible. Whenever a salt is reused, it can potentially help an attacker. For example, if you use the user name as salt, an attacker can build Rainbow Tables using "admin" and/or "root" as the table would serve many places that have users with these "names".
Similarly, when a user changes the password, the name remains, leading to the reuse of the salt. Old passwords are always targets of value, because users tend to reuse them in many places. (it is always reported that it is a bad idea, everyone knows, but keeps doing it because it makes life easier) In addition, people tend to generate sequential passwords. If the old password of Alaor is "Senhasecreta37", well able that the new one is "Senhasecreta38" or "Senhasecreta39".
The "cheap" way to get Salts unique is to use randomness. If you generate your salt with random bytes of a safe generator that their OFFER THEM, (/dev/urandom
, CryptGenRandom()
...) you will have values of salt "single enough, "if they are 16 bytes for example, to never see in life a collision of salt.
The UUID is a standard way to get "unique" values. Remember that UUID "version 4" uses randomness (122 bits) as mentioned above.
Several frameworks offer simple functions to generate UUID on demand, and they can be used as salt
The salt has to be secret?
The salt was not meant to be secret, otherwise it would be called key. You don’t need to disclose salt, but if you need to (using a hash on the client, for example), do not worry.
The salt exists only to be unique. It is nothing more than selecting a hash function among many.
Pepper
Cryptographers cannot leave a metaphor quiet. They need to extend them with more analogies and puns (salt means "salt", Pepper means "pepper"). If you use a Pepper in its hash function, you are using a different kind of cryptographic algorithm; you are calculating a message authentication code (MAC) about the password. MAC key is your Pepper.
"Spice up" makes sense if you have a secret key that the attacker is not able to read. Remember that we use the hash why we believe that an attacker can get a copy of the server database or even the entire disk. A common case is a server with two disks on RAID 1. A disk fails, by having the board burned. It happens all the time. The sysadmin change the disk, the mirror is redone, and nothing is lost thanks to the magic of RAID 1. Well, but the old disk doesn’t work anymore, and sysadmin no longer has an easy way to wipe its contents, so it just discards the disk. The attacker finds the puck in the trash, switches the sign, and look at that! It has a complete copy of the system, with database, configuration, executables, OS...
To the Pepper work in such a case, you need something more than a PC with disks; you need a hardware security device (HSM).
Hsms are expensive, both in economic and operational terms, but with an HSM, just use the Pepper and process passwords with a simple HMAC (for example with SHA-1 or SHA-256). This will be much more efficient than bcrypt/PBKDF2/scrypt and its complex iterations. Besides, wearing a HSM makes you look very professional in a Webtrust Audit.
Hashing client-side
Like the hashing is purposely "expensive", it would make sense in a client-server architecture to use the client’s CPU. After all, when 100 clients connect to one server, collectively they have much more processing power. For this, the server needs to send the salt for the client. This implies another round-trip data, which may or may not be easy to add in specific cases. In a web context it is more complicated, because javascript has always been more "anemic" to use the CPU. In a context of remote secure password (SRP), the hashing has to happen anyway on the client side.
Completion
Use bcrypt (some are against). PBKDF2 isn’t bad either. If you use scrypt, you will be "slightly ahead" with the risks implied by this expression; but it is good for the progress of science. Be a "crash dummy" is an honorable profession!
This is an adaptation of excellent response given by Thomas pornin on the website Information Security, which is also part of the network Stack Exchange.
If possible use Argon2, he was the winner of PHC, has more options of adjustments, if compared to PBKDF and Bcrypt.
– zixuan
Passwords are usually short, which makes brute force attacks easier. The hash function has a fixed size feature. This makes it possible to generate all possible passwords and find hash collisions. The common practice is to increase the password (with some padding) before calculating the hash, so your hash map is asymmetrical, and will have more possibilities in case of brute force attack. This padding is stored in another database (physically and logically separated).
– user178974
Once I did something like this: The user enters the password into the system, the system calculates the hash, after that applies an encryption based on the Feistel cipher with many rounds (enough to take 2~3 seconds) and stores the encrypted hash in the database. A reverse brute-force attack would take 2~3 seconds per attempt, if the "attacker" knew its cipher. The problem is that password validation takes 2~3 seconds, but what is 2~3 seconds in exchange for more security.
– user178974