Email validation function

Asked

Viewed 1,988 times

1

I need help with this code 'cause I’m racking my brain and I can’t fix it.

You first enter the number of emails you want to enter with input. Then you type the emails according to the number you typed before.

Once this is done the program will check whether it passes in the regular expression rule, if it passes add in a list, otherwise it does not add it and returns the list alphabetically by end.

The code is like this for now:

import re
def fun(s):
# return True if s is a valid email, else return False

s = emails
padrao = re.search(r'^[\w-]+@ [a-z\d]+\.[\w]{3}', s)

for i in range(len(s)):
if s[i] = padrao:
return True
else:
return False

def filter_mail(emails):
return list(filter(fun, emails))

if __name__ == '__main__':
n = int(input())
emails = []
for _ in range(n):
emails.append(input())

filtered_emails = filter_mail(emails)
filtered_emails.sort()
print(filtered_emails)
  • 2

    Please correct the indentation of your code.

  • 1

    In Python, indentation is not optional - it’s the syntax. When pasting code here, use the button {} to format while preserving identation.

3 answers

4


About your regex:

^[\w-]+@ [a-z\d]+\.[\w]{3}

I don’t know if it was a typo, but notice there’s a gap after the @. This makes the expression only validate emails that has a space there (like user@ email.com). So the first thing to do is remove that space.

Another detail is that shortcut \w corresponds to letters, numbers and the character _. And in Python 3, by default, it also corresponds to other letters defined in Unicode, such as Japanese characters (and several other languages), for example:

import re

print(re.match(r'\w+', '鳥山.').group()) # imprime 鳥山

If you only want the letters of our alphabet, you can use flag ASCII, or simply use [a-zA-Z0-9] in place of \w:

# ambos imprimem "None", pois não encontram mais nenhum match
print(re.match(r'\w+', '鳥山.', flags=re.ASCII))
print(re.match(r'[a-zA-Z0-9]+', '鳥山.'))

Another detail is that you used [\w-]+, which means "one or more occurrences of a \w or one -". And how \w also includes the character _, this means that regex will accept emails as [email protected].

Finally, the part after the @ ends with \.[\w]{3}. First of all, [\w] is redundant as \w already represents a specific set of characters, and putting it between brackets is redundant (it only makes sense if you want to put other things together with the \w, as you did with [\w-]). Then you can switch to simply \w{3}.

But this will only accept domains with exactly 3 letters (excluding the .io, .br, .info, among many others). And how \w also accepts numbers and _, this regex accepts emails as user@teste._1_. Not to mention that you don’t accept emails that end with com.br, for example.

Then you can change everything after the @ for something like (?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}. The parentheses (?: and ) form a catch group. I’m basically grouping the sub-expression within them, and the ?: tells the regex engine not to store what is captured (if you do not use ?:, what is in parentheses is internally stored and can be obtained from match after - but since I don’t want that, I can indicate in the regex itself using the ?:).

The above section ensures that [a-zA-Z0-9-]+\. (letters, numbers or - followed by a point), is repeated once or more (indicated by + after parentheses). This ensures that we can have emails .com.br, .abc.def.etc.com and so on.

Finally, we have 2 or more letters ([a-zA-Z]{2,}), which ensures that .br and .info (and any other, provided it has at least two letters) are accepted.

Another thing I would do is add the bookmark $, that delimits the end of the string. You’ve used ^ (string start), then use it along with $ ensures that the whole string will have only what is in the expression, and nothing else.


Anyway, doing a regex that correctly validates 100% of valid emails is very complicated. See this article, for example, it starts with something not too complicated and ends with a monstrous regex.

It is up to you to decide how complicated your regex will be, because the more you need it (the more special cases it supports), the more complicated and difficult it will be to understand and maintain. But if there are special cases you don’t want to treat (such as IP addresses in the domain, or user@localhost, for example), so it doesn’t pay to do something so complicated. Find the balance between accuracy, complexity and practicality (and this varies from one case to another).

I talk a little more using regex to validate emails here, here and here (the latter has some options at the end, just do not recommend the latter).


Regardless of the regex you choose, the check/filter/sort can be done like this:

import re

r = re.compile(r'^[\w-]+@(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}$')
emails = ['[email protected]', 'nao sou email', '[email protected]']


# filtra e ordena
filtered_emails = sorted(email for email in emails if r.match(email))
print(filtered_emails)

I used the syntax of comprehensilist on, much more succinct and pythonic. The line that creates filtered_emails is equivalent to:

filtered_emails = []
for email in emails:
    if r.match(email):
        filtered_emails.append(email)
filtered_emails.sort()

In both cases, the resulting list is:

['[email protected]', '[email protected]']


To read the amount, I suggest validating if what was typed is really a number. If not, ask the user to try again (encapsulated this in a function).

Finally, I also use a comprehensilist on to read the emails and already put them in a list.

The complete code:

import re

def le_quantidade():
    while True:
        try:
            return int(input('quantidade de emails:'))
        except ValueError:
            # se não digitar um número, int() lança um ValueError
            print('Digite um número válido')

n = le_quantidade()
# lê os emails e coloca em uma lista
emails = [input('Digite um email:') for _ in range(n)]

r = re.compile(r'^[\w-]+@(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}$')
# filtra e ordena
filtered_emails = sorted(email for email in emails if r.match(email))
print(filtered_emails)

0

Thank you so much for all the information. It helped me find the regex I needed. But just to be clear for future consultations of others, regex needed to have these rules:

Valid email Addresses must follow These Rules:

It must have the [email protected] format type. The username can only contain Letters, digits, Dashes and underscores. The website name can only have Letters and digits. The Maximum length of the Extension is 3.

The final code went like this:

import re
def fun(s):
padrao = re.search(r'[a-zA-Z0-9_-]+@[a-zA-Z0-9]+\.[a-zA-Z]{1,3}$', s)

if padrao:
return True
else:
return False

def filter_mail(emails):
return list(filter(fun, emails))

if __name__ == '__main__':
n = int(input())
emails = []
for _ in range(n):
emails.append(input())

filtered_emails = filter_mail(emails)
filtered_emails.sort()
print(filtered_emails)

I removed that for loop because it was traversing a single element, not an array of elements, as I had thought

0

Okay, but why do you assign emails to s in the role fun? And what is the meaning of the tie? And that regular expression, is right?

I believe what you want is this:

def fun(s):
    # return True if s is a valid email, else return False
    return re.search(r'^[\w]+@[\w]+\.[\w]{2,4}', s) != None

Browser other questions tagged

You are not signed in. Login or sign up in order to post.