Check if variable contains a well formatted PHP email address

Asked

Viewed 7,818 times

10

The variable receives a value that is supposed to be an email address, but doubts arise:

  • It will be an email address, random text, or anything else?
  • Being an email address, it is well formatted?

Assuming a function valida_email() that would return boolean:

echo valida_email("teste");                     // devolve FALSE
echo valida_email("teste@[email protected]");     // devolve FALSE
echo valida_email("testar.com");                // devolve FALSE
echo valida_email("teste [email protected]");      // devolve FALSE
echo valida_email("té[email protected]");           // devolve FALSE
echo valida_email("teste@teste");               // devolve FALSE
echo valida_email(array("[email protected]"));      // devolve FALSE
echo valida_email("[email protected]"); // devolve TRUE

Question

In PHP, how can we validate if the content of a variable is a well formatted email address?


Note: It does not matter if it exists or does not exist, it matters that it is a valid email address medians the existing rules that define its formatting, see RFC822 and RFC5322 both in English.

Disambiguation: For validation on the basis of RFC6530, please consult the answers in this question.

  • I just don’t understand why echo valida_email(array("[email protected]"); return false, in which case passing an array could validate multiple email addresses in a single call.

  • @Marcelodiniz Just to illustrate that the function in this example expects a string and not a matrix or an object, in the sense of keeping things simple, focusing the answers on the validation of the formatting itself, I followed that path. But your idea is good anyway.

  • 2

    @Zuul, I came here by the term IPV6... but to give an up on the question, [email protected] becomes valid. Gmail now recognizes accents

  • 1

    @Papacharlie Opened a new question for the RFC6530 so as not to cancel the answers already present in this. Thank you for the information whose same was alien to me!

8 answers

13

An option is also using filter_var():

function valida_email($email) {
    return filter_var($email, FILTER_VALIDATE_EMAIL);
}
  • 2

    According to the comments in the manual, this fails with some emails, so it does not strictly follow the Rfcs. Even so, I think it’s a good idea to delegate the problem to PHP itself.

  • 1

    No doubt! Besides faster than validating coma ER does not keep you awake at night reviewing such complexity in the head.

  • 1

    It’s 2015 and I wanted more people to use it, or rather, a solution compatible with internationalized emails, although I suppose this will be supported by PHP in the (distant) future. Instead, they use proprietary implementations that allow invalid emails like "this.. for [email protected]" or that don’t allow "[email protected]".

5

Use that library: http://code.google.com/p/isemail/downloads

Remembering that depending on the library your results may vary, so it is important to use a library. If you’re going to use a regex you’ll end up with something like this (source 1, source 2):

/^(?!(?:(?:\x2`2?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){255,})(?!(?:(?:\x22?
\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){65,}@)(?:(?:[\x21\x23-\x27\x2A\x2B\x2D
\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|
(?:\x5C[\x00-\x7F]))*\x22))(?:\.(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-
\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*
\x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+)*\.){1,126}){1,}(?:
(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:\[(?:(?:IPv6:(?:(?:[a-f0-9]
... é muito maior do que você imagina ...

So never use regex to validate emails!1

Even more so because with the emergence of International Tlds you can find emails like "esse e-mail é válido"@cachaça.com, e-també[email protected]. The latter exists and is mine, although by the current specification it is not valid there are already validators that are already considering it valid, like this.

  • Note that Gmail is already considering internationalized emails as valid, including these two examples. If you want to test go to Settings Accounts and import Add another address Use "your-email+âlgö[email protected]" and send an email with it.

4

The best way to check if a text is within a predefined pattern is by using regular expressions. On one occasion a perl script was developed to generate an expression that validated an e-mail according to RFC822 and the result was the following expression:

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ 
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
 \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
)*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
 \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
 \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
?:\r\n)?[ \t])*))*)?;\s*)

Unfortunately this type of expression ends up being very costly for the application and, other than that, you do not want people to think that your expression is some kind of virus. There are other simpler expressions to validate emails - maybe not with the rigor of RFC822, but rather close to it:

/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+)*\.(aero|arpa|biz|com|coop|edu|gov|info|int|mil|museum|name|net|org|pro|travel|mobi|[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i

At this link, there are several examples of regular expressions and their respective tests with various types of email. Choose one and see which one suits your purposes better. :)

2

Download and use the link function: https://github.com/PrimosTI/kit/blob/master/php/lib/mailparse.php

<?php
require_once 'mailparse.php';

$mail = '[email protected]';
$analise = mail_parse_address($mail);

if($analise) {
    // E-mail bem-formatado
    switch($analise['type']) {
    case MAIL_ADDR_TYPE_HOSTNAME:
        // Você também pode verificar o tipo do endereço
        echo 'E-mail válido.\r\n';
        echo 'Usuário:' . $analise['local'] . '\r\n';
        echo 'Domínio:' . $analise['tag'] . '\r\n';
        break;
    case MAIL_ADDR_TYPE_IPV4:
    case MAIL_ADDR_TYPE_IPV6:
        // Aceita endereços do tipo IP.
        echo 'E-mail válido.\r\n';
        echo 'Usuário:' . $analise['local'] . '\r\n';
        echo 'IPv' . ($analise['type'] == MAIL_ADDR_TYPE_IPV4 ? '4' : '6' ) . ':' . $analise['tag'] . '\r\n';
        break;
    case MAIL_ADDR_TYPE_GENERIC:
        // Aceita endereços de tags
        echo 'E-mail válido.\r\n';
        echo 'Usuário:' . $analise['local'] . '\r\n';
        echo 'Tag:' . $analise['tag'] . '\r\n';
        break;
    case MAIL_ADDR_TYPE_IDN:
        // Estamos desenvolvendo o suporte aos endereços IDN,
        // contemplando os novos tipos de e-mail conforme os
        // novos padrões da internet.
    }
}
?>

0

I recommend using the following function implementing RFC 822. Very good and easy to use:

<?php

#
# RFC 822/2822/5322 Email Parser
#
# By Cal Henderson <[email protected]>
#
# This code is dual licensed:
# CC Attribution-ShareAlike 2.5 - http://creativecommons.org/licenses/by-sa/2.5/
# GPLv3 - http://www.gnu.org/copyleft/gpl.html
#
# $Revision$
#

##################################################################################

function is_valid_email_address($email, $options=array()){

    #
    # you can pass a few different named options as a second argument,
    # but the defaults are usually a good choice.
    #

    $defaults = array(
        'allow_comments'    => true,
        'public_internet'   => true, # turn this off for 'strict' mode
    );

    $opts = array();
    foreach ($defaults as $k => $v) $opts[$k] = isset($options[$k]) ? $options[$k] : $v;
    $options = $opts;



    ####################################################################################
    #
    # NO-WS-CTL       =       %d1-8 /         ; US-ASCII control characters
    #                         %d11 /          ;  that do not include the
    #                         %d12 /          ;  carriage return, line feed,
    #                         %d14-31 /       ;  and white space characters
    #                         %d127
    # ALPHA          =  %x41-5A / %x61-7A   ; A-Z / a-z
    # DIGIT          =  %x30-39

    $no_ws_ctl  = "[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x7f]";
    $alpha      = "[\\x41-\\x5a\\x61-\\x7a]";
    $digit      = "[\\x30-\\x39]";
    $cr     = "\\x0d";
    $lf     = "\\x0a";
    $crlf       = "(?:$cr$lf)";


    ####################################################################################
    #
    # obs-char        =       %d0-9 / %d11 /          ; %d0-127 except CR and
    #                         %d12 / %d14-127         ;  LF
    # obs-text        =       *LF *CR *(obs-char *LF *CR)
    # text            =       %d1-9 /         ; Characters excluding CR and LF
    #                         %d11 /
    #                         %d12 /
    #                         %d14-127 /
    #                         obs-text
    # obs-qp          =       "\" (%d0-127)
    # quoted-pair     =       ("\" text) / obs-qp

    $obs_char   = "[\\x00-\\x09\\x0b\\x0c\\x0e-\\x7f]";
    $obs_text   = "(?:$lf*$cr*(?:$obs_char$lf*$cr*)*)";
    $text       = "(?:[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f]|$obs_text)";

    #
    # there's an issue with the definition of 'text', since 'obs_text' can
    # be blank and that allows qp's with no character after the slash. we're
    # treating that as bad, so this just checks we have at least one
    # (non-CRLF) character
    #

    $text       = "(?:$lf*$cr*$obs_char$lf*$cr*)";
    $obs_qp     = "(?:\\x5c[\\x00-\\x7f])";
    $quoted_pair    = "(?:\\x5c$text|$obs_qp)";


    ####################################################################################
    #
    # obs-FWS         =       1*WSP *(CRLF 1*WSP)
    # FWS             =       ([*WSP CRLF] 1*WSP) /   ; Folding white space
    #                         obs-FWS
    # ctext           =       NO-WS-CTL /     ; Non white space controls
    #                         %d33-39 /       ; The rest of the US-ASCII
    #                         %d42-91 /       ;  characters not including "(",
    #                         %d93-126        ;  ")", or "\"
    # ccontent        =       ctext / quoted-pair / comment
    # comment         =       "(" *([FWS] ccontent) [FWS] ")"
    # CFWS            =       *([FWS] comment) (([FWS] comment) / FWS)

    #
    # note: we translate ccontent only partially to avoid an infinite loop
    # instead, we'll recursively strip *nested* comments before processing
    # the input. that will leave 'plain old comments' to be matched during
    # the main parse.
    #

    $wsp        = "[\\x20\\x09]";
    $obs_fws    = "(?:$wsp+(?:$crlf$wsp+)*)";
    $fws        = "(?:(?:(?:$wsp*$crlf)?$wsp+)|$obs_fws)";
    $ctext      = "(?:$no_ws_ctl|[\\x21-\\x27\\x2A-\\x5b\\x5d-\\x7e])";
    $ccontent   = "(?:$ctext|$quoted_pair)";
    $comment    = "(?:\\x28(?:$fws?$ccontent)*$fws?\\x29)";
    $cfws       = "(?:(?:$fws?$comment)*(?:$fws?$comment|$fws))";


    #
    # these are the rules for removing *nested* comments. we'll just detect
    # outer comment and replace it with an empty comment, and recurse until
    # we stop.
    #

    $outer_ccontent_dull    = "(?:$fws?$ctext|$quoted_pair)";
    $outer_ccontent_nest    = "(?:$fws?$comment)";
    $outer_comment      = "(?:\\x28$outer_ccontent_dull*(?:$outer_ccontent_nest$outer_ccontent_dull*)+$fws?\\x29)";


    ####################################################################################
    #
    # atext           =       ALPHA / DIGIT / ; Any character except controls,
    #                         "!" / "#" /     ;  SP, and specials.
    #                         "$" / "%" /     ;  Used for atoms
    #                         "&" / "'" /
    #                         "*" / "+" /
    #                         "-" / "/" /
    #                         "=" / "?" /
    #                         "^" / "_" /
    #                         "`" / "{" /
    #                         "|" / "}" /
    #                         "~"
    # atom            =       [CFWS] 1*atext [CFWS]

    $atext      = "(?:$alpha|$digit|[\\x21\\x23-\\x27\\x2a\\x2b\\x2d\\x2f\\x3d\\x3f\\x5e\\x5f\\x60\\x7b-\\x7e])";
    $atom       = "(?:$cfws?(?:$atext)+$cfws?)";


    ####################################################################################
    #
    # qtext           =       NO-WS-CTL /     ; Non white space controls
    #                         %d33 /          ; The rest of the US-ASCII
    #                         %d35-91 /       ;  characters not including "\"
    #                         %d93-126        ;  or the quote character
    # qcontent        =       qtext / quoted-pair
    # quoted-string   =       [CFWS]
    #                         DQUOTE *([FWS] qcontent) [FWS] DQUOTE
    #                         [CFWS]
    # word            =       atom / quoted-string

    $qtext      = "(?:$no_ws_ctl|[\\x21\\x23-\\x5b\\x5d-\\x7e])";
    $qcontent   = "(?:$qtext|$quoted_pair)";
    $quoted_string  = "(?:$cfws?\\x22(?:$fws?$qcontent)*$fws?\\x22$cfws?)";

    #
    # changed the '*' to a '+' to require that quoted strings are not empty
    #

    $quoted_string  = "(?:$cfws?\\x22(?:$fws?$qcontent)+$fws?\\x22$cfws?)";
    $word       = "(?:$atom|$quoted_string)";


    ####################################################################################
    #
    # obs-local-part  =       word *("." word)
    # obs-domain      =       atom *("." atom)

    $obs_local_part = "(?:$word(?:\\x2e$word)*)";
    $obs_domain = "(?:$atom(?:\\x2e$atom)*)";


    ####################################################################################
    #
    # dot-atom-text   =       1*atext *("." 1*atext)
    # dot-atom        =       [CFWS] dot-atom-text [CFWS]

    $dot_atom_text  = "(?:$atext+(?:\\x2e$atext+)*)";
    $dot_atom   = "(?:$cfws?$dot_atom_text$cfws?)";


    ####################################################################################
    #
    # domain-literal  =       [CFWS] "[" *([FWS] dcontent) [FWS] "]" [CFWS]
    # dcontent        =       dtext / quoted-pair
    # dtext           =       NO-WS-CTL /     ; Non white space controls
    # 
    #                         %d33-90 /       ; The rest of the US-ASCII
    #                         %d94-126        ;  characters not including "[",
    #                                         ;  "]", or "\"

    $dtext      = "(?:$no_ws_ctl|[\\x21-\\x5a\\x5e-\\x7e])";
    $dcontent   = "(?:$dtext|$quoted_pair)";
    $domain_literal = "(?:$cfws?\\x5b(?:$fws?$dcontent)*$fws?\\x5d$cfws?)";


    ####################################################################################
    #
    # local-part      =       dot-atom / quoted-string / obs-local-part
    # domain          =       dot-atom / domain-literal / obs-domain
    # addr-spec       =       local-part "@" domain

    $local_part = "(($dot_atom)|($quoted_string)|($obs_local_part))";
    $domain     = "(($dot_atom)|($domain_literal)|($obs_domain))";
    $addr_spec  = "$local_part\\x40$domain";



    #
    # this was previously 256 based on RFC3696, but dominic's errata was accepted.
    #

    if (strlen($email) > 254) return 0;


    #
    # we need to strip nested comments first - we replace them with a simple comment
    #

    if ($options['allow_comments']){

        $email = email_strip_comments($outer_comment, $email, "(x)");
    }


    #
    # now match what's left
    #

    if (!preg_match("!^$addr_spec$!", $email, $m)){

        return 0;
    }

    $bits = array(
        'local'         => isset($m[1]) ? $m[1] : '',
        'local-atom'        => isset($m[2]) ? $m[2] : '',
        'local-quoted'      => isset($m[3]) ? $m[3] : '',
        'local-obs'     => isset($m[4]) ? $m[4] : '',
        'domain'        => isset($m[5]) ? $m[5] : '',
        'domain-atom'       => isset($m[6]) ? $m[6] : '',
        'domain-literal'    => isset($m[7]) ? $m[7] : '',
        'domain-obs'        => isset($m[8]) ? $m[8] : '',
    );


    #
    # we need to now strip comments from $bits[local] and $bits[domain],
    # since we know they're in the right place and we want them out of the
    # way for checking IPs, label sizes, etc
    #

    if ($options['allow_comments']){
        $bits['local']  = email_strip_comments($comment, $bits['local']);
        $bits['domain'] = email_strip_comments($comment, $bits['domain']);
    }


    #
    # length limits on segments
    #

    if (strlen($bits['local']) > 64) return 0;
    if (strlen($bits['domain']) > 255) return 0;


    #
    # restrictions on domain-literals from RFC2821 section 4.1.3
    #
    # RFC4291 changed the meaning of :: in IPv6 addresses - i can mean one or
    # more zero groups (updated from 2 or more).
    #

    if (strlen($bits['domain-literal'])){

        $Snum           = "(\d{1,3})";
        $IPv4_address_literal   = "$Snum\.$Snum\.$Snum\.$Snum";

        $IPv6_hex       = "(?:[0-9a-fA-F]{1,4})";

        $IPv6_full      = "IPv6\:$IPv6_hex(?:\:$IPv6_hex){7}";

        $IPv6_comp_part     = "(?:$IPv6_hex(?:\:$IPv6_hex){0,7})?";
        $IPv6_comp      = "IPv6\:($IPv6_comp_part\:\:$IPv6_comp_part)";

        $IPv6v4_full        = "IPv6\:$IPv6_hex(?:\:$IPv6_hex){5}\:$IPv4_address_literal";

        $IPv6v4_comp_part   = "$IPv6_hex(?:\:$IPv6_hex){0,5}";
        $IPv6v4_comp        = "IPv6\:((?:$IPv6v4_comp_part)?\:\:(?:$IPv6v4_comp_part\:)?)$IPv4_address_literal";


        #
        # IPv4 is simple
        #

        if (preg_match("!^\[$IPv4_address_literal\]$!", $bits['domain'], $m)){

            if (intval($m[1]) > 255) return 0;
            if (intval($m[2]) > 255) return 0;
            if (intval($m[3]) > 255) return 0;
            if (intval($m[4]) > 255) return 0;

        }else{

            #
            # this should be IPv6 - a bunch of tests are needed here :)
            #

            while (1){

                if (preg_match("!^\[$IPv6_full\]$!", $bits['domain'])){
                    break;
                }

                if (preg_match("!^\[$IPv6_comp\]$!", $bits['domain'], $m)){
                    list($a, $b) = explode('::', $m[1]);
                    $folded = (strlen($a) && strlen($b)) ? "$a:$b" : "$a$b";
                    $groups = explode(':', $folded);
                    if (count($groups) > 7) return 0;
                    break;
                }

                if (preg_match("!^\[$IPv6v4_full\]$!", $bits['domain'], $m)){

                    if (intval($m[1]) > 255) return 0;
                    if (intval($m[2]) > 255) return 0;
                    if (intval($m[3]) > 255) return 0;
                    if (intval($m[4]) > 255) return 0;
                    break;
                }

                if (preg_match("!^\[$IPv6v4_comp\]$!", $bits['domain'], $m)){
                    list($a, $b) = explode('::', $m[1]);
                    $b = substr($b, 0, -1); # remove the trailing colon before the IPv4 address
                    $folded = (strlen($a) && strlen($b)) ? "$a:$b" : "$a$b";
                    $groups = explode(':', $folded);
                    if (count($groups) > 5) return 0;
                    break;
                }

                return 0;
            }
        }           
    }else{

        #
        # the domain is either dot-atom or obs-domain - either way, it's
        # made up of simple labels and we split on dots
        #

        $labels = explode('.', $bits['domain']);


        #
        # this is allowed by both dot-atom and obs-domain, but is un-routeable on the
        # public internet, so we'll fail it (e.g. user@localhost)
        #

        if ($options['public_internet']){
            if (count($labels) == 1) return 0;
        }


        #
        # checks on each label
        #

        foreach ($labels as $label){

            if (strlen($label) > 63) return 0;
            if (substr($label, 0, 1) == '-') return 0;
            if (substr($label, -1) == '-') return 0;
        }


        #
        # last label can't be all numeric
        #

        if ($options['public_internet']){
            if (preg_match('!^[0-9]+$!', array_pop($labels))) return 0;
        }
    }


    return 1;
}

##################################################################################

function email_strip_comments($comment, $email, $replace=''){

    while (1){
        $new = preg_replace("!$comment!", $replace, $email);
        if (strlen($new) == strlen($email)){
            return $email;
        }
        $email = $new;
    }
}

##################################################################################
?>

It is used by FW Solar PHP

-1

This function returns true if the $email informed fits the standard set in regex: [email protected]

function valida_email($email) {
return preg_match('/^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$/', $email)
}
  • 1

    Explain a little better how this code fits the solution of the problem that @Zuul has.

  • This function returns true if the $email entered fits the pattern set in regex: [email protected].

-1

I would bet on regular experiences. Because RFC5322 is younger than RFC822 then I would try an expression that validates it, like this one:

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*
@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

But this above expression still allows [email protected], which is a bit of a hassle. Then one could choose to add the domains, maybe it was part of a Whitelist, example:

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@
(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|edu|gov|mil|
biz|info|mobi|name|aero|asia|jobs|museum)\b

But it is also uncomfortable to be putting worldwide domains, all the time new domains are added.

I think these expressions are the best ways, from my point of view.

This is the site I usually search for these here

-3

You can mount a function using regular expressions...

function validaEmail($email) {
    $conta = "^[a-zA-Z0-9\._-]+@";
    $domino = "[a-zA-Z0-9\._-]+.";
    $extensao = "([a-zA-Z]{2,4})$";

    $pattern = $conta.$domino.$extensao;
    if (preg_match($pattern, $email))
        return true;
    else
        return false;
}

// Define uma variável para testar o validador
$input = "[email protected]";
// Faz a verificação usando a função
if (validaEmail($input)) {
    echo "O e-mail inserido é valido!";
} else {
    echo "O e-mail inserido é invalido!";
}

Taken from the blog of Thiago Belém...

http://blog.thiagobelem.net/validacao-de-e-mail-no-php-com-expressoes-regulares/

Browser other questions tagged

You are not signed in. Login or sign up in order to post.