Regular expression to validate a password with Python

Asked

Viewed 679 times

3

Passwords must contain at least 5 words (word = 1 or more letters), each separated by a hyphen, a space, a dot, a comma or an underscore. Example: a-b-b-c-d-d

OR

Passwords must be at least 8 characters long and contain at least one uppercase, lowercase, digit and punctuation ( "!@#$%<^&*?").

For the first, I tried:

r"[\w.-\s,_]{4,}\w{1,}"

For the second case, I tried:

r"\w+\d+[!@#\$<>&\*\?]+[\w\d!@#\$<>&\*\?]{5,}"

import re

sentence = "test%#"

pattern = r""

print(f" match: {re.findall(pattern, sentence)}")

Any idea?

  • "at least characters" - I think the amount was missing, right? : -) Anyway, what is a "word"? If it is @-123,abc $@,xy, is a valid password? Because "words" would be "@", "123", "abc", "$@" and "xy" (and are separated by the characters indicated). Is that it? So your example password (test%#) is invalid, right?

  • @hkotsubo : just edited -> at least 8 characters

  • @hkotsubo For the first, an example of password validates: a-b-c-.d.e

  • a-b-c-.d.e does not seem valid to me, because you said that also must have at least one capital letter, one digit, and punctuation...

  • @hkotsubo are 2 different types of passwords, ie 2 different regex are!

  • But in the first case, do "words" only have letters? Or can they have other things?

  • 1

    "5 words, each separated by a hyphen..." and "must have at least 8 characters"" are mutually exclusive conditions. The minimum in the case would be 9 characters, 5 glyphs and 4 spacers.

  • 1

    @hkotsubo " word" = 1 or more letters

Show 3 more comments

2 answers

4


For the first case, use \w does not serve, because this shortcut also considers digits and character _, and how do you want the _ be one of the separators, so he can not be part of the word.

Assuming that it cannot have accented characters, one way to consider the "word" is [a-zA-Z]+ (the quantifier + indicates "one or more occurrences").

It just takes one word. Then we have to have the "separator + word" sequence, and it should be repeated at least 4 times (so we have at least 5 words separated by the characters indicated).

For the tab, just use [- .,_]. Then, just put the same definition of "word" after, and make this sequence repeat at least 4 times. Ie, ([- .,_][a-zA-Z]+){4,}.

Put it all together, it’s [a-zA-Z]+([- .,_][a-zA-Z]+){4,}.

Note that in this case I do not need to place the requirement of at least 8 characters. Because if you have at least 5 words (with at least 1 character) plus the 4 separators, you will already have more than 8 characters.


For the second case (check the required characters), we use lookaheads, that serve to see if something exists in front. For example, to see if there is at least one digit, we use (?=.*[0-9]). The trick is that the lookehead It only checks if something exists, but then it goes back to where it was and keeps checking the rest of the regex. So we ensure that, after checking if it has a digit, it returns and checks the rest of the expression (ie if it has letter, etc).

For each type of required character we use one Lookahead, then the regex gets this monstrous thing:

^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[!@#$%<^&*?])[a-zA-Z0-9!@#$%<^&*?]{8,}$

Each Lookahead checks if a character type exists, and then regex checks that it has at least 8 of the specified characters.

I also used the markers ^ and $, which indicate respectively the beginning and end of the string, so I guarantee that it only has what the regex indicates (not one more character, not the least).

Since regex can be one thing or another, we use | to indicate that it can be one or the other. There is nothing "beautiful":

import re

r = re.compile(r'^(((?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[!@#$%<^&*?])[a-zA-Z0-9!@#$%<^&*?]{8,})|([a-zA-Z]+([- .,_][a-zA-Z]+){4,}))$')

for senha in ['a.b.c.d.e', 'A.b-c @1_xyz', 'a.b.c', 'Abc123@!&']:
    print(f'{senha} = {"válida" if r.match(senha) else "inválida"}')

The exit code above is:

a.b.c.d.e = válida
A.b-c @1_xyz = inválida
a.b.c = inválida
Abc123@!& = válida

Of course you can also use two separate regex and check if the password matches either:

def valida(senha):
    return re.match(r'^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[!@#$%<^&*?])[a-zA-Z0-9!@#$%<^&*?]{8,}$', senha) \
           or re.match(r'^[a-zA-Z]+([- .,_][a-zA-Z]+){4,}$', senha)

for senha in ['a.b.c.d.e', 'A.b-c @1_xyz', 'a.b.c', 'Abc123@!&']:
    print(f'{senha} = {"válida" if valida(senha) else "inválida"}')

If you want, you can also change the conditions of lookaheads by separate regex:

def valida(senha):
    return (re.search(r'[0-9]', senha) and \
            re.search(r'[a-z]', senha) and \
            re.search(r'[A-Z]', senha) and \
            re.search(r'[!@#$%<^&*?]', senha) and \
            re.match(r'^[a-zA-Z0-9!@#$%<^&*?]{8,}$', senha)) \
           or re.match(r'^[a-zA-Z]+([- .,_][a-zA-Z]+){4,}$', senha)

The first regex checks if it has a digit, the second checks if it has a lowercase letter, etc (I used search for check at any position of the string, for match only does the search from the beginning). The fifth checks if it has at least 8 of the characters indicated (here it does not matter to use match or search, since the ^ forces regex to search from the beginning of the string).

-1

For each case, you have a specific Regex. Follow the syntax of each Regex. For the second case, lookaheads are the ideal solution as they check the pattern before giving match

Passwords must contain at least 5 words (word = 1 or more letters), each separated by a hyphen, a space, a dot, a comma or an underscore. Example: a-b-b-c-d-d

re.sub(r'\b'{4, }, '-', variavel_string)

Passwords must be at least 8 characters long and contain at least one uppercase, lowercase, digit and punctuation ( "!@#$%<^&*?")

re.match('r(^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@$#%&çÇ`\'^~]).{8,}$)')
  • 1

    The first regex makes no sense because sub to replace snippets of the string, and not to find Matches. By the way, the code does not even execute, see. And even if you fix that code, use \b{4, } doesn’t make sense because \b marks only a position of the string that has an alphanumeric character before and an alphanumeric character after, and vc is looking for 4 consecutive positions that are like this, which does not validate anything. In practice, this only returns passwords without changes and does not validate anything: https://ideone.com/NXLNIo

  • It’s actually worse, because of the space after the comma, regex looks for the text {4, } and replace it with a hyphen: https://ideone.com/X5FsFw - That is, it is much away from validating anything... And if remove the space gives error because the \b is not quantifiable: https://ideone.com/jSNQZZ

Browser other questions tagged

You are not signed in. Login or sign up in order to post.