Regular expression to validate a password with Python

Question

Regular expression to validate a password with Python

Asked 5 years, 4 months ago

Viewed 679 times

3

Passwords must contain at least 5 words (word = 1 or more letters), each separated by a hyphen, a space, a dot, a comma or an underscore. Example: a-b-b-c-d-d

OR

Passwords must be at least 8 characters long and contain at least one uppercase, lowercase, digit and punctuation ( "!@#$%<^&*?").

For the first, I tried:

r"[\w.-\s,_]{4,}\w{1,}"

For the second case, I tried:

r"\w+\d+[!@#\$<>&\*\?]+[\w\d!@#\$<>&\*\?]{5,}"

import re

sentence = "test%#"

pattern = r""

print(f" match: {re.findall(pattern, sentence)}")

Any idea?

"at least characters" - I think the amount was missing, right? : -) Anyway, what is a "word"? If it is @-123,abc $@,xy, is a valid password? Because "words" would be "@", "123", "abc", "$@" and "xy" (and are separated by the characters indicated). Is that it? So your example password (test%#) is invalid, right?

– hkotsubo

2020/03/29 at 17:14
@hkotsubo : just edited -> at least 8 characters

– Paul Sigonoso

2020/03/29 at 17:18
@hkotsubo For the first, an example of password validates: a-b-c-.d.e

– Paul Sigonoso

2020/03/29 at 17:21
a-b-c-.d.e does not seem valid to me, because you said that also must have at least one capital letter, one digit, and punctuation...

– hkotsubo

2020/03/29 at 17:55
@hkotsubo are 2 different types of passwords, ie 2 different regex are!

– Paul Sigonoso

2020/03/29 at 17:59
But in the first case, do "words" only have letters? Or can they have other things?

– hkotsubo

2020/03/29 at 18:01
1

"5 words, each separated by a hyphen..." and "must have at least 8 characters"" are mutually exclusive conditions. The minimum in the case would be 9 characters, 5 glyphs and 4 spacers.

– Augusto Vasques

2020/03/29 at 18:02
1

@hkotsubo " word" = 1 or more letters

– Paul Sigonoso

2020/03/29 at 18:05

Show 3 more comments

2 answers

4

For the first case, use \w does not serve, because this shortcut also considers digits and character _, and how do you want the _ be one of the separators, so he can not be part of the word.

Assuming that it cannot have accented characters, one way to consider the "word" is [a-zA-Z]+ (the quantifier + indicates "one or more occurrences").

It just takes one word. Then we have to have the "separator + word" sequence, and it should be repeated at least 4 times (so we have at least 5 words separated by the characters indicated).

For the tab, just use [- .,_]. Then, just put the same definition of "word" after, and make this sequence repeat at least 4 times. Ie, ([- .,_][a-zA-Z]+){4,}.

Put it all together, it’s [a-zA-Z]+([- .,_][a-zA-Z]+){4,}.

Note that in this case I do not need to place the requirement of at least 8 characters. Because if you have at least 5 words (with at least 1 character) plus the 4 separators, you will already have more than 8 characters.

For the second case (check the required characters), we use lookaheads, that serve to see if something exists in front. For example, to see if there is at least one digit, we use (?=.*[0-9]). The trick is that the lookehead It only checks if something exists, but then it goes back to where it was and keeps checking the rest of the regex. So we ensure that, after checking if it has a digit, it returns and checks the rest of the expression (ie if it has letter, etc).

For each type of required character we use one Lookahead, then the regex gets this monstrous thing:

^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[!@#$%<^&*?])[a-zA-Z0-9!@#$%<^&*?]{8,}$

Each Lookahead checks if a character type exists, and then regex checks that it has at least 8 of the specified characters.

I also used the markers ^ and $, which indicate respectively the beginning and end of the string, so I guarantee that it only has what the regex indicates (not one more character, not the least).

Since regex can be one thing or another, we use | to indicate that it can be one or the other. There is nothing "beautiful":

import re

r = re.compile(r'^(((?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[!@#$%<^&*?])[a-zA-Z0-9!@#$%<^&*?]{8,})|([a-zA-Z]+([- .,_][a-zA-Z]+){4,}))$')

for senha in ['a.b.c.d.e', 'A.b-c @1_xyz', 'a.b.c', 'Abc123@!&']:
    print(f'{senha} = {"válida" if r.match(senha) else "inválida"}')

The exit code above is:

a.b.c.d.e = válida
A.b-c @1_xyz = inválida
a.b.c = inválida
Abc123@!& = válida

Of course you can also use two separate regex and check if the password matches either:

def valida(senha):
    return re.match(r'^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[!@#$%<^&*?])[a-zA-Z0-9!@#$%<^&*?]{8,}$', senha) \
           or re.match(r'^[a-zA-Z]+([- .,_][a-zA-Z]+){4,}$', senha)

for senha in ['a.b.c.d.e', 'A.b-c @1_xyz', 'a.b.c', 'Abc123@!&']:
    print(f'{senha} = {"válida" if valida(senha) else "inválida"}')

If you want, you can also change the conditions of lookaheads by separate regex:

def valida(senha):
    return (re.search(r'[0-9]', senha) and \
            re.search(r'[a-z]', senha) and \
            re.search(r'[A-Z]', senha) and \
            re.search(r'[!@#$%<^&*?]', senha) and \
            re.match(r'^[a-zA-Z0-9!@#$%<^&*?]{8,}$', senha)) \
           or re.match(r'^[a-zA-Z]+([- .,_][a-zA-Z]+){4,}$', senha)

The first regex checks if it has a digit, the second checks if it has a lowercase letter, etc (I used search for check at any position of the string, for match only does the search from the beginning). The fifth checks if it has at least 8 of the characters indicated (here it does not matter to use match or search, since the ^ forces regex to search from the beginning of the string).

Browser other questions tagged python python-3.x regex

You are not signed in. Login or sign up in order to post.

by Anderson Fidelis • **311** points · Answer 1 · 2020-04-09T21:15:43+00:00

For each case, you have a specific Regex. Follow the syntax of each Regex. For the second case, lookaheads are the ideal solution as they check the pattern before giving match

Passwords must contain at least 5 words (word = 1 or more letters), each separated by a hyphen, a space, a dot, a comma or an underscore. Example: a-b-b-c-d-d

re.sub(r'\b'{4, }, '-', variavel_string)

Passwords must be at least 8 characters long and contain at least one uppercase, lowercase, digit and punctuation ( "!@#$%<^&*?")

re.match('r(^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@$#%&çÇ`\'^~]).{8,}$)')