Use the following regular expression
/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)/
soon your code gets:
from urllib.request import urlopen
from re import findall
def emails(url):
content = urlopen(url).read().decode()
#print(content)
padrao = "([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)"
mails = findall(padrao, content)
print(mails)
url = "http://www.cdm.depaul.edu"
emails(url)
exit:
['[email protected]', '[email protected]', '[email protected]', '[email protected]']
Anchors in regular expressions determine a position before, after, or between characters. They can be used to "anchor" regex correspondence to a given position. The intercalation sign corresponds to the position before the first character in the sequence. The application of regex " a" in "abc" corresponds to "a". In turn the regex " b" does not match in "abc", because "b" cannot be matched right after the start of the string, matched by the character .
To test regular expressions intuitively recommend the following site https://rubular.com/
Well, the answer below already says that the problem was the
^
and the$
, that indicate the beginning and end of the string (i.e., regex would only find something if the string had only email). About using regex to validate emails, there are a few things here, here, here and here (this last link has some options at the end, just do not recommend the last regex).– hkotsubo