TL;DR
Generally speaking, the most standard emails follow the regular expression [^@]+@[^.@]+\.[^@]+
. In sql
, more or less the following takes the a good part positive cases of email:
SELECT
*
FROM
tabela_com_email t
WHERE
t.email LIKE '%@%.%' AND
t.email NOT LIKE '%@%@%'
Email address
An email address is an identifier composed of 3 parts:
local-part
@
domain
With the exception of @
which is a constant, domain and local part are governed by their own rules. Usually, any DNS address can sr used in the part domain
; including, if you use an IP address, you can write john.doe@[192.158.11.15]
, with brackets to indicate literal IP (a very similar notation serves for Ipv6). The local part is much more boring to validate, see the syntax according to Wikipedia.
Bizarre examples of valid emails
Copied from Wikipedia:
[email protected]
[email protected]
[email protected]
[email protected]
[email protected].
[email protected]
"[email protected]"@example.com
"very.(),:;<>[]\".VERY.\"very@\\ \"very\".unusual"@strange.example.com
[email protected]
admin@mailserver1
#!$%&'*+-/=?^_`{}|[email protected]
"()<>[]:,;@\\\"!#$%&'-/=?^_`{}| ~.a"@example.org
" "@example.org
[email protected]
user@localserver
user@tt
user@[IPv6:2001:DB8::1]
Validators in practice
Many validators verify the presence of a single @
, forcing characters to exist before and after. Other validators, which allow less spam than this previous example I mentioned, take into account that the domain needs to have at least one point ([email protected]
or [email protected]
).
Further reading
These internet things are governed by Rfcs. The RFC of the email address is the RFC 5322. The section dedicated to this subject is Section 3.4.1.
From the document, we have the following notation BNF-simile:
addr-spec = local-part "@" domain
local-part = dot-atom / quoted-string / obs-local-part
domain = dot-atom / domain-literal / obs-domain
domain-literal = [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]
Under acronyms:
- RFC - request for comments, request for comments, IETF Internet standards suggestion document
- IETF - Internet Engineering Task Force
UPDATE - validating example considering only RFC 5322
Emails that would be valid:
teste@cliente
calteste5@calteste5
calteste6@calteste6
vava@ius
[email protected]
[email protected]
[email protected]
[email protected]
joao@bbb
[email protected]
caldemo5@caldemo5
teste@teste
calteste4@calteste4
Emails that would be invalid (explanation after Tilde ~):
Pablo Fernandes ~ não tem @, espaço não escapado
xxx ~ não tem @
eduardostubbert ~ não tem @
123456 ~ não tem @
xxx ~ não tem @
xxx ~ não tem @
bethania@ ~ não tem domínio
xxxxx ~ não tem @
usuario1 ~ não tem @
Recommendation ICANN
ICANN (Internet Corporation for Assigned Names and Numbers) strongly recommends not to use prohibits the use of pointless domain in email address. Source
Constructing the expressions of the beginning of the answer
More about regex see Quick Regular Expression Query Guide of Aurelio Verde.
As a regular expression meta character appears, I will explain in separate notes like this
I can identify valid elements as a language. In this other answer define language as a subset of words generated through the concatenation of the letters denim set.
In general, all possibilities of words over valid printable characters are:
.*
.
: selection meta character that matches any character
*
: quantifier meta character representing the kleene star; zero or more repetitions of the regular expression preceding that operator
An email address needs a @
in the middle of the word:
.*@.*
Hmmm, the *
is Kleene’s star, so he would validate @localhost
as valid. I need at least one character in the domain part and one in the local part:
.+@.+
+
: another quantifying meta character, the most Kleene. It is similar to the star at all, but requires the existence of at least one repetition; R+
can be replaced without semantic difference by RR*
. Some regular expression engines do not recognize the +
Now I’m a little more satisfied, but you still recognize @@@
as valid email. Most email services do not allow more strange schemes, such as quotes and escapes to avoid @
in the middle of the local part, also do not allow @
in the domain part. In general, simply consider the non-existence of the @
the local part and the domain is good enough:
[^@]+@[^@]+
[^@]
: denied list; accepts any character that is different from the characters between the opening [^
and the closure ]
. In the above example, you just don’t accept the @
; you could pass character ranges such as [^a-d]
, which denies all the characteristics of a
until d
Phew! We managed to find it very well. But it still does not follow the suggestion of ICANN, which requires the presence of a .
in the field:
[^@]+@[^@]+\.[^@]+
\
: escape meta character, also called criptonite; totally removes any possible meta character interpretation of the following character; then \.
accepts a literal point
Constructing the expression like
Let’s start with regular expression:
[^@]+@[^@]+\.[^@]+
Generally speaking, something that accepts an arbitrary amount of characters is the %
:
%@%.%
This ensures the presence of the @
and of a .
in the domain part. However, this does not guarantee the presence of a single @
. For this, it is sufficient that it is not similar to the following standard:
%@%@%
Why
teste@cliente
would not be valid?– Jéf Bueno
Test for registration, look for "@", "." I don’t know if you can do anything other than this
– Motta
@Not wise LINQ that was valid, only containing the @ is already considered a valid email?
– Marconi
It depends on the scope, @Marconi So I asked why it is invalid. Overall, it is a completely valid format. If you want an example, here on the local network we have an email server under the domain
gaia
, my colleagues email me usinglinq@gaia
.– Jéf Bueno
@Marconi , the RFC says that the only invalid addresses in this register are those that do not have
@
and thebethania@
, that does not have the mastery part– Jefferson Quesado
@LINQ, has a "recommending" of ICANN on issues without a point; I added at the end of my reply
– Jefferson Quesado
That’s why I said it depends on scope... The email domains here (locations) work like this, regardless of whether ICANN or Jon Skeet say otherwise. If our internal system were to stop us from registering these addresses, it would be useless.
– Jéf Bueno
@LINQ I didn’t know it was rs! good to know =D
– Marconi
@Jeffersonquesado I’ll read your answer straight!
– Marconi
@LINQ Yes, true. ICANN does not run your company or your home. I put that remark by... well, I forgot =) I think it was to focus on the Marconi case, emails in production/wide internet
– Jefferson Quesado