How can I list all invalid emails?

Asked

Viewed 1,341 times

5

I ended up finding something like this due to the lack of validation that we didn’t have.

teste@cliente
calteste5@calteste5
calteste6@calteste6
vava@ius
Pablo Fernandes
xxx
eduardostubbert
123456
[email protected]
xxx
[email protected]
xxx
[email protected]
[email protected]
bethania@
joao@bbb
[email protected]
caldemo5@caldemo5
teste@teste
xxxxx
usuario1
calteste4@calteste4

All these emails are incorrect, so I would like to know the following aspects:

  • There is a standard (structure) that is adopted to consider a valid email?
  • How do I list all emails in a column that deviate from this pattern? Or is, which are considered invalid?

Obs: I know that even if an email does not exist it can be considered valid. Ex: [email protected]

  • 1

    Why teste@cliente would not be valid?

  • Test for registration, look for "@", "." I don’t know if you can do anything other than this

  • @Not wise LINQ that was valid, only containing the @ is already considered a valid email?

  • It depends on the scope, @Marconi So I asked why it is invalid. Overall, it is a completely valid format. If you want an example, here on the local network we have an email server under the domain gaia, my colleagues email me using linq@gaia.

  • @Marconi , the RFC says that the only invalid addresses in this register are those that do not have @ and the bethania@, that does not have the mastery part

  • @LINQ, has a "recommending" of ICANN on issues without a point; I added at the end of my reply

  • 1

    That’s why I said it depends on scope... The email domains here (locations) work like this, regardless of whether ICANN or Jon Skeet say otherwise. If our internal system were to stop us from registering these addresses, it would be useless.

  • 1

    @LINQ I didn’t know it was rs! good to know =D

  • 1

    @Jeffersonquesado I’ll read your answer straight!

  • @LINQ Yes, true. ICANN does not run your company or your home. I put that remark by... well, I forgot =) I think it was to focus on the Marconi case, emails in production/wide internet

Show 5 more comments

1 answer

11


TL;DR

Generally speaking, the most standard emails follow the regular expression [^@]+@[^.@]+\.[^@]+. In sql, more or less the following takes the a good part positive cases of email:

SELECT
    *
FROM
    tabela_com_email t
WHERE
    t.email LIKE '%@%.%' AND
    t.email NOT LIKE '%@%@%'

Email address

An email address is an identifier composed of 3 parts:

  1. local-part
  2. @
  3. domain

With the exception of @ which is a constant, domain and local part are governed by their own rules. Usually, any DNS address can sr used in the part domain; including, if you use an IP address, you can write john.doe@[192.158.11.15], with brackets to indicate literal IP (a very similar notation serves for Ipv6). The local part is much more boring to validate, see the syntax according to Wikipedia.

Bizarre examples of valid emails

Copied from Wikipedia:

[email protected]  
[email protected]  
[email protected]  
[email protected]  
[email protected].  
[email protected]
"[email protected]"@example.com  
"very.(),:;<>[]\".VERY.\"very@\\ \"very\".unusual"@strange.example.com  
[email protected]  
admin@mailserver1  
#!$%&'*+-/=?^_`{}|[email protected]  
"()<>[]:,;@\\\"!#$%&'-/=?^_`{}| ~.a"@example.org  
" "@example.org  
[email protected]  
user@localserver  
user@tt  
user@[IPv6:2001:DB8::1]  

Validators in practice

Many validators verify the presence of a single @, forcing characters to exist before and after. Other validators, which allow less spam than this previous example I mentioned, take into account that the domain needs to have at least one point ([email protected] or [email protected]).

Further reading

These internet things are governed by Rfcs. The RFC of the email address is the RFC 5322. The section dedicated to this subject is Section 3.4.1.

From the document, we have the following notation BNF-simile:

addr-spec       =   local-part "@" domain
local-part      =   dot-atom / quoted-string / obs-local-part
domain          =   dot-atom / domain-literal / obs-domain
domain-literal  =   [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]

Under acronyms:

  • RFC - request for comments, request for comments, IETF Internet standards suggestion document
  • IETF - Internet Engineering Task Force

UPDATE - validating example considering only RFC 5322

Emails that would be valid:

teste@cliente
calteste5@calteste5
calteste6@calteste6
vava@ius
[email protected]
[email protected]
[email protected]
[email protected]
joao@bbb
[email protected]
caldemo5@caldemo5
teste@teste
calteste4@calteste4

Emails that would be invalid (explanation after Tilde ~):

Pablo Fernandes ~ não tem @, espaço não escapado
xxx ~ não tem @
eduardostubbert ~ não tem @
123456 ~ não tem @
xxx ~ não tem @
xxx ~ não tem @
bethania@ ~ não tem domínio
xxxxx ~ não tem @
usuario1 ~ não tem @

Recommendation ICANN

ICANN (Internet Corporation for Assigned Names and Numbers) strongly recommends not to use prohibits the use of pointless domain in email address. Source

Constructing the expressions of the beginning of the answer

More about regex see Quick Regular Expression Query Guide of Aurelio Verde.

As a regular expression meta character appears, I will explain in separate notes like this

I can identify valid elements as a language. In this other answer define language as a subset of words generated through the concatenation of the letters denim set.

In general, all possibilities of words over valid printable characters are:

.*

.: selection meta character that matches any character

*: quantifier meta character representing the kleene star; zero or more repetitions of the regular expression preceding that operator

An email address needs a @ in the middle of the word:

.*@.*

Hmmm, the * is Kleene’s star, so he would validate @localhost as valid. I need at least one character in the domain part and one in the local part:

.+@.+

+: another quantifying meta character, the most Kleene. It is similar to the star at all, but requires the existence of at least one repetition; R+ can be replaced without semantic difference by RR*. Some regular expression engines do not recognize the +

Now I’m a little more satisfied, but you still recognize @@@ as valid email. Most email services do not allow more strange schemes, such as quotes and escapes to avoid @ in the middle of the local part, also do not allow @ in the domain part. In general, simply consider the non-existence of the @ the local part and the domain is good enough:

[^@]+@[^@]+

[^@]: denied list; accepts any character that is different from the characters between the opening [^ and the closure ]. In the above example, you just don’t accept the @; you could pass character ranges such as [^a-d], which denies all the characteristics of a until d

Phew! We managed to find it very well. But it still does not follow the suggestion of ICANN, which requires the presence of a . in the field:

[^@]+@[^@]+\.[^@]+

\: escape meta character, also called criptonite; totally removes any possible meta character interpretation of the following character; then \. accepts a literal point

Constructing the expression like

Let’s start with regular expression:

 [^@]+@[^@]+\.[^@]+

Generally speaking, something that accepts an arbitrary amount of characters is the %:

%@%.%

This ensures the presence of the @ and of a . in the domain part. However, this does not guarantee the presence of a single @. For this, it is sufficient that it is not similar to the following standard:

%@%@%
  • 1

    Could you explain how regular expression works at the beginning of the @Jefferson question? The link from RFC 5322 is that correct? Can’t find the section

  • 1

    @Marconi I will explain better how I got to regex in the answer. About RFC, I am able to open quietly here. Title of section: Addr-Spec Specification. Direct link to section: https://tools.ietf.org/html/rfc5322#Section-3.4.1)

  • I will also edit the reply and put the direct link where it fits

  • 1

    Tranquil @Jefferson, thank you so much! Great response :)

  • 1

    @Marconi updated =] hopes you have solved the doubts

Browser other questions tagged

You are not signed in. Login or sign up in order to post.