What do I need to do to measure password strength?

Asked

Viewed 1,996 times

35

Often we need to accept the input of passwords by users. In general we cannot accept any password that can be easily attacked, probably by Brut force (gross force) or similar techniques.

It is obvious that the password should have a minimum size, should have variety of characters, can not be similar to the username and who knows other data. I imagine that the strength of the password is not measured by a simple check if there are all these criteria. All together can make the password very secure, but excessively difficult to be determined by a human.

But how to calculate the force? Is there any known standard algorithm that establishes this? If it does not exist, what should be observed?

  • 3

    Some links about passphrases instead of passwords: this, this and this

  • 3

    Check out the zxcvbn.js library created by Dropbox staff: https://blogs.dropbox.com/tech/2012/04/zxcvbn-realistic-password-strength-estimation/ ------------------------------------------------------------------ https://github.com/dropbox/zxcvbn

  • 2

    Good question. I don’t understand the subject, but I was going to guess (and partially guess) entropy: http://en.wikipedia.org/wiki/Password_strength (the problem is the "People are notoriously Poor at achieving sufficient Entropy to Produce satisfactory passwords"). :)

3 answers

29


To determine the strength of a password, you need to analyze how attackers proceed when trying to break a password. This excellent article by Bruce Schneier (in English) describes the state of the art in breaking passwords, either "blind" (i.e. when nothing is known about the user in question) or directed. One also considers the scenario where the attacker has access to the victim’s data (e.g., a copy of his hard drive), but this extrapolates the scope of the question.

Forms of attack

The set of possible passwords is virtually infinite, unless there is a size limit. However, the set of passwords probable is much smaller. Consider an 8-letter password, for example. It is much more likely that the user will choose one that contains a word than simply a random set of characters (type yrhxmmpl). An attack that tried all 8-letter combinations (208 billion) would have a success rate lower than one that tried, say, all the words of 11 most widely spoken languages (3.2 million).

Determining which passwords are the most "likely" is not difficult: just do a statistical analysis. You can do this with users of your own service, or use data from one of several "leaks" of passwords that always occur around the world (often involving large companies, with many customers, so sampling is significant). By way of example, some conclusions mentioned in the article:

  • Some passwords are very common, despite their weakness (password, letmein, 123456, etc). It pays to test by a list of, say, 1000 most common passwords before anything else.

    • If combined with the 100 most common affixes (see below), this set of 100,000 candidates break almost 25% of passwords.
  • The most common password format consists of a "root" and an "affix". Suffixes are more common than prefixes. The root is usually a word - perhaps transformed by a few simple rules (e.g..: s -> $, a -> @, t -> 7, etc) or a given name. The affix in general is a number, a special symbol, or both.

    • It is estimated that this test is sufficient to break from 55% to 65% of passwords, although the time spent is on the order of weeks.
  • When using victim’s personal information in conjunction with the above attacks (either by analyzing other known passwords, to identify common patterns, or by taking names, addresses, numbers, etc and adding to root and affix dictionaries before carrying out the attack) the success rate increases slightly, and the time spent on the attack decreases considerably.

  • Finally, when the dictionary attack proves ineffective, it is still preferable to test for things that if appear with words rather than simply random strings. Random sequences yet pronounceable are more likely to be used as passwords than those that are not.

This is not the only form of attack, of course, but it is the most effective. If a "type" of password becomes very popular, adapting the attack to this type would give a better chance than a generic attack. It should also be noted that the standards presented above derive almost directly from the policies that companies impose in the choice of passwords: if everyone asks for "a letter, a number and a special character", users will choose the simplest sequence that meets these minimum criteria, and exchange experiences and "tips" with each other - leading to a standard that technically meets the requirements but does not provide the desired protection.

Strong passwords

A strong password is simply a password that will not be detected by the default used by the attacker, either because it does not belong to the tested set or because it even belongs, but will be tested last according to the rules. It becomes tempting simply to reject the pattern, but then you just fall into another pattern. Especially because, if your service will reject passwords according to some objective criterion, it is very important inform the user of what these criteria are (to avoid for example the drama of cabbage... :Q). And in doing so, you also inform the attacker what type of password it is useless to try...

But an even more important reason for not creating arbitrary rules for passwords is that we need to take into account the human factor safety, not only the "machine" factor. How already satirized in XKCD and well debated in security.SE, a password needs to be difficult for the machine to guess and easy to remember user. Ignoring this second part leads to practices even more damaging to security than a weak password, such as the habit of annotating the password on a post-it and pasting it on the monitor.

Fortunately, there are ways to produce passwords that are relatively simple to remember and quite difficult to guess - even if the pattern used is known. One is the use of the first letter of each word of a phrase [relatively long], as a password produced in this way is almost indistinguishable from a random sequence[Citation needed].

Entropy

Finally answering the question: the formal method for determining the strength of a password is the entropy (as defined in Information Theory; not to be confused with the concept of the same name in Thermodynamics). Because the password consists of a secret that the owner knows but the attacker does not know, its security lies in the fact that an attempt to "guess" this password (i.e. choose the correct possibility in a set of possible values) has a negligible chance of success. However, it is not enough that the set is large, if some candidates occur much more frequently in this set than the average, it is said that the entropy of it is low.

And it’s good to make that clear: entropy is not a feature of the password itself, but yes of the process that originated that password. The question that needs to be asked is: "when using the X process to choose a password, what is the probability of the Y password being chosen?".

Measuring entropy is complicated, and subject to many mistakes. As an example, a common method for estimating the strength of the password is to check which character sets the components of the password belong to, and to assume that a password of size N belongs to the set of possible strings with N characters of that set. This method is flawed because - assuming a password like Password1$ - it is established that the set size is 95 (uppercase letters, lowercase, digits and ASCII symbols) and the password size is 10, and it is estimated that "65 bits of entropy" (log2 9510), which would be a pretty strong password! But if the calculation is done in another way - say a word in the list of the most common 1000 (10 bits), with or without the first letter capitalized (1 bit), followed by a digit (3.3 bits) and a special symbol (5 bits), in one order or another (1 bit), it reaches 20.3 bits of entropy, revealing that it is too weak a password...

(By the way, which of these two calculations is correct? Well, both! The procedure of generating a random password with 10 ASCII characters will reach Password1$ once in 265, and the procedure for choosing a word from a list of 1000 [containing password], capitalize or not, and add a digit and a symbol will reach Password1$ once in 220.3. Remember, entropy is process, not password. If there is such a thing as "password strength", I would say that it is the smallest entropy among all the "reasonable" strategies that a user may have employed to choose his password. But this is still subjective.)

Still, you can set lower limits for entropy. An 8-letter lowercase password, for example, will have a maximum of 37 bits of entropy (log2 268). Probably less, but never again. Because a password with this format sluice only 37 bits of entropy. There is much talk that longer passwords are safer than shorter passwords, but this is not true in all situations (pqzrwj is a more secure password than password1) - what is true is that in a longer password "fits" more entropy than in a shorter password. Choosing a strong and memorable password with 20 characters is easier than doing the same with 8.

Applying entropy analysis to determine the "strength" of the password

If there are N different ways of calculating entropy, which one is the most "correct", or perhaps the most "useful"? I would say that one can divide the analysis into two cases: one that takes into account the average case, and another that takes into account the worst case.

Suppose they do exist E strategies that the attacker could use to try to figure out a password. Each strategy represents a specific password generation (choice) procedure. Associated with each procedure, means of measuring/estimating its entropy can also be associated. Some examples of procedures, and corresponding strategy, would be:

P0 - Gerar uma senha aleatória de 8 caracteres.
E0 - Testar todas as sequências de 8 caracteres uma por uma.
M0 - Calcular o tamanho do menor alfabeto que contém a senha, elevado ao tamanho da senha.

P1 - Escolher uma palavra do dicionário e anexar um sufixo numérico.
E1 - Escolher um dicionário, testar todas as palavras desse dicionário combinadas com todos
     os sufixos até um tamanho X.
M1 - Escolher um dicionário de frequências de palavras, verificar se a senha contém uma
     palavra, qual sua probabilidade de ser escolhida dada sua frequência, e acrescentar
     3.3 bits de entropia para cada dígito no sufixo.

P2 - Escolher quatro palavras do dicionário.
E2 - Escolher um dicionário, testar todas as combinações de quatro palavras.
M2 - Escolher um dicionário, calcular quantas combinações de quatro palavras existem,
     verificar se a senha é composta de quatro palavras nesse dicionário.

P3 - Pensar numa frase, pegar a primeira letra de cada palavra dessa frase.
E3 - Escolher um dicionário e uma gramática, gerar todas as frases possíveis.
M3 - ???

...

In a worst-case analysis, we would assume that the attacker knows which procedure the users most use, and would simply choose the strategy En to the detriment of all others. Thus, the entropy of each process to which the password applies would be estimated, and the lowest value found would be considered as "the strength of the password".

The average case analysis is similar, only that the assumption that the attacker will choose the right strategy every time does not exist. Instead, one can assign a probability to each of the strategies considered, and to make a weighted average of all the estimates obtained instead of choosing the lowest value found. Alternatively (and I believe, more correct, although it is only my opinion), one can add bits of entropy to each estimate representing the chance that the attack strategy was used, combined with the chance that the password was discovered according to the same strategy, and continue considering only the lowest value found.

An analysis of what would be more accurate, considering only the average case, the worst case or something in the middle of the way would be too extensive to be exposed here. I suggest the study of Game theory for a better understanding of the most appropriate way to behave in the presence of an opponent who also takes target choices into account when adapting his strategy.

Completion

There is no consensus on how best to choose passwords, nor on how to estimate the strength of a particular password. If you decide to be liberal in the types of password you accept, my suggestion is to keep a list of common passwords and/or words and reject them immediately, reject known user data as part of the password (as quoted in the question itself)and reject passwords whose size is too small. Even so, I fear you will have little success in ensuring an adequate level of security in accepted passwords.

If you decide to be draconian, on the other hand, it is possible not only to ensure a minimum force, but it can be done without antagonizing the user too much. The very suggestion of XKCD (4 randomly chosen words from a 2000 list), for example, guarantees passwords with little less than 44 bits of entropy, which is not particularly high but is significantly higher than the average of freely chosen passwords by the user (speak on an average of 40 bits, but the reality is probably much smaller). With a little creativity, one could think of an even safer and easier to remember pattern. It would therefore be sufficient indicate to the user the mandatory format, suggest a password automatically, and verify if the final password matches the expected format (maybe leaving some margin).

This would not necessarily be well received by users (some will find it too complicated, others will have their own concept - probably wrong - of what is a "safe password" and will bother not being able to use it), so I am not recommending its use, simply citing as the only way I know to ensure a certain accuracy in the measurement of password strength.

  • Friend, in the area of computing I have often seen them use the word entropy, but the meaning of it I believe is not the intended one. If I’m not mistaken, entropy is the second law of thermodynamics that says that once you turn something into energy (or heat (I don’t remember)) you can’t undo the transformation. We used to try to convey the idea of irretrievable, but I think we should use another word with this meaning. Anyway, only an old (and evil) habit. Hug.

  • 3

    @pmargreff According to Wikipedia, "entropy is the measure of the disorder (chaos) of a system". Its main uses are in thermodynamics, as you pointed out, and in information theory, where it measures the average amount of "information" contained in a message (relative to its size). It is in this context that the password analysis fits. There are also other domains that have their own concept of entropy, such as ecology and medicine, always representing disorder.

12

As mgibsonbr says in his answer, the strength of a password is associated with the difficulty an opponent has to guess it. Therefore, we measure the strength of a password rule based on how many possible passwords we can write obeying the rule, which is proportional to the time an opponent will have to spend to guess the password.

The problem with this definition is that the search space for the bruteforce depends on which rule you are taking into account. For example, if we take as search space passwords that are a combination of any letters, the password "qwertyuiop", which has 10 letters will take a considerable amount of time to guess. However, if the opponent tries to guess the easy passwords first, like the passwords that make a pattern on the keyboard he will guess the "qwertyuiop" in an instant.

Given these difficulties, if you still want to make a password strength meter, I recommend taking a look at zxcvbn, Dropbox. It takes into account patterns such as dictionary words, keyboard patterns, and l337sp3@k and generates an estimate of the strength of the password. There’s an online demo here

Other than that, it also encourages the user to choose a good password1, there are a few things you can do as a developer to increase the security of system passwords:

  • The password form must be served on a secure connection. No use doing a super security on the server if anyone in the same wifi user can read his password in transit.

  • Never store passwords in readable format or using reversible encryption. If an opponent gets access to your database (even read-only, due to an injection of sql or a misplaced backup disk) he will have access to all passwords without having to do anything.

    • This means that you can’t do password recovery by sending the password back by email, instead make the user choose a new password.
  • Use a hash function suitable for passwords such as bcrypt, scrypt or PBKDF2. Do not use functions such as md5 or SHA1

    • A good hash function for password should be slow to calculate. When the user authenticates it does the account once, which is imperceptible, but when it comes to an opponent making bruteforce the slower the hash function the better. Configure the parameter of key-stretching higher than your server can handle.
    • Do not use the same hash function for all users. This lets an opponent know that users share password just by seeing the hash. It also allows the bruteforce to be done with all users at the same time, in parallel. In general the solution to this is to add a salt in the password. Most libraries for password hashing via bcrypt, etc, already do this for you.

1 The best tip for safe passwords is to use password management software. This allows you to generate super long random passwords and helps you not use the same password on more than one site. For passwords you need to memorize, the ideal is to have a "passphrase" instead of a "keyword". Sentences are a little longer to type but are much easier to memorize.

As a system developer, there are a few things you can do to

  • 1

    Note that I don’t mean to use an entirely different algorithm for each user. It was more of an analogy to say why we should "season" each password with a different "salt"

9

In short, we can make our own score to check if it is "weak", "average", "strong" or even "doesn’t meet the minimum required".

I have javascript code that demonstrates the score in the form of a score and as the score shows the "strength" that the password has.

$(document).ready(function() 
{
        var strPassword=0;
        var charPassword=0;
        var complexity = $("#complexity");
        var minPasswordLength = 8;
        var baseScore = 0, score = 0;

        var num = {};
        num.Excess = 0;
        num.Upper = 0;
        num.Numbers = 0;
        num.Symbols = 0;

        var bonus = {};
        bonus.Excess = 3;
        bonus.Upper = 4;
        bonus.Numbers = 5;
        bonus.Symbols = 5;
        bonus.Combo = 0; 
        bonus.FlatLower = 0;
        bonus.FlatNumber = 0;

        outputResult();
        $("#inputPassword").bind("keyup", checkVal);

function checkVal()
{
        init();

        if (charPassword.length >= minPasswordLength)
        {
                baseScore = 50; 
                analyzeString();    
                calcComplexity();       
        }
        else
        {
                baseScore = 0;
        }

        outputResult();
}

function init()
{
        strPassword= $("#inputPassword").val();
        charPassword = strPassword.split("");

        num.Excess = 0;
        num.Upper = 0;
        num.Numbers = 0;
        num.Symbols = 0;
        bonus.Combo = 0; 
        bonus.FlatLower = 0;
        bonus.FlatNumber = 0;
        baseScore = 0;
        score =0;
}

function analyzeString ()
{   
        for (i=0; i<charPassword.length;i++)
        {
                if (charPassword[i].match(/[A-Z]/g)) {num.Upper++;}
                if (charPassword[i].match(/[0-9]/g)) {num.Numbers++;}
                if (charPassword[i].match(/(.*[!,@,#,$,%,^,&,*,?,_,~])/)) {num.Symbols++;} 
        }

        num.Excess = charPassword.length - minPasswordLength;

        if (num.Upper && num.Numbers && num.Symbols)
        {
                bonus.Combo = 25; 
        }

        else if ((num.Upper && num.Numbers) || (num.Upper && num.Symbols) || (num.Numbers && num.Symbols))
        {
                bonus.Combo = 15; 
        }

        if (strPassword.match(/^[\sa-z]+$/))
        { 
                bonus.FlatLower = -15;
        }

        if (strPassword.match(/^[\s0-9]+$/))
        { 
                bonus.FlatNumber = -35;
        }
}

function calcComplexity()
{
        score = baseScore + (num.Excess*bonus.Excess) + (num.Upper*bonus.Upper) + (num.Numbers*bonus.Numbers) + (num.Symbols*bonus.Symbols) + bonus.Combo + bonus.FlatLower + bonus.FlatNumber;

}   

function outputResult()
{
        if ($("#inputPassword").val()== "")
        { 
                complexity.html("Digite a Senha").removeClass("weak strong stronger strongest").addClass("default");
        }
        else if (charPassword.length < minPasswordLength)
        {
                complexity.html("No mínimo " + minPasswordLength+ " caracteres por favor!").removeClass("strong stronger strongest").addClass("weak");
        }
        else if (score<50)
        {
                complexity.html("Fraca!").removeClass("strong stronger strongest").addClass("weak");
        }
        else if (score>=50 && score<75)
        {
                complexity.html("Média!").removeClass("stronger strongest").addClass("strong");
        }
        else if (score>=75 && score<100)
        {
                complexity.html("Forte!").removeClass("strongest").addClass("stronger");
        }
        else if (score>=100)
        {
                complexity.html("Segura!").addClass("strongest");
        }


        $("#details").html("Base Score :<span class=\"value\">" + baseScore  + "</span>"
            + "<br />Length Bonus :<span class=\"value\">" + (num.Excess*bonus.Excess) + " ["+num.Excess+"x"+bonus.Excess+"]</span> " 
            + "<br />Upper case bonus :<span class=\"value\">" + (num.Upper*bonus.Upper) + " ["+num.Upper+"x"+bonus.Upper+"]</span> "
            + "<br />Number Bonus :<span class=\"value\"> " + (num.Numbers*bonus.Numbers) + " ["+num.Numbers+"x"+bonus.Numbers+"]</span>"
            + "<br />Symbol Bonus :<span class=\"value\"> " + (num.Symbols*bonus.Symbols) + " ["+num.Symbols+"x"+bonus.Symbols+"]</span>"
            + "<br />Combination Bonus :<span class=\"value\"> " + bonus.Combo + "</span>"
            + "<br />Lower case only penalty :<span class=\"value\"> " + bonus.FlatLower + "</span>"
            + "<br />Numbers only penalty :<span class=\"value\"> " + bonus.FlatNumber + "</span>"
            + "<br />Total Score:<span class=\"value\"> " + score  + "</span>" );
}

}
);

Note the function outputResult where it validates based on the score

  1. Minimum size
  2. Uppercase
  3. Numbers
  4. Symbols

A website that measures as an example http://www.passwordmeter.com/

  • 5

    It would be good if you put in an executable snippet and maybe explain a little the code, don’t you think?

Browser other questions tagged

You are not signed in. Login or sign up in order to post.