Validate preg_replace domain

Asked

Viewed 354 times

2

My question is this, I have the following function:

$site = preg_replace("/[^a-zA-Z0-9]/", "", $_POST['site']);

But when using in the string www.lostscavenge.com.br, that regular expression emits the following return: wwwlostscavengecombr

How do I allow and not remove points?

  • As well only the use of the point?

  • I want to clear the string, take out all the special characters, allowing only the use of the stitch, for example when I use this function and type in the form www.lostscavenge.com.br my string looks like this: wwwlostscavengecombr Which was supposed to look like this: www.lostscavenge.com.br

2 answers

2


Domain names may currently be Unicode (utf8). Depending on the rules of your business model, if you need to allow domains that have non-ASCII characters, the following routine may be useful:

function validate_domain_name($str, $force_utf8 = true)
{
    $force_utf8 = $force_utf8? 'u': '';

    //Isso é ineficiente.
    //$re = '[^a-zA-Z0-9\.]';

    //Isso é ineficiente. Pois não valida normas básicas
    //$re = '^(http[s]?\:\/\/)?((\w+)\.)?(([\w-]+)?)(\.[\w-]+){1,2}$';

    //Esse é mais consistente
    $re = '^(?!\-)(?:[\w\d\-]{0,62}[\w\d]\.){1,126}(?!\d+)[\w\d]{1,63}$';

    if (preg_match('/'.$re.'/'.$force_utf8, $str, $rs) && isset($rs[0]) && !empty($rs[0])) {
        return $rs[0];
    } else {
        return null;
    }
}

$str = '000-.com';
$str = '-000.com';
$str = '000.com'; // válido
$str = 'foo-.com';
$str = '-foo.com';
$str = 'foo.com'; // válido
$str = 'foo.any'; // válido
$str = 'お名前0.com'; // válido
$str = 'お名前.コム'; // válido

echo 'domain: '.validate_domain_name($str);

To disable Unicode, set the second parameter to boolean false.

The original regular expression has been adapted from this response: https://stackoverflow.com/a/16491074/1685571

The adaptations I made were changing a-zA-Z for \w and add the option to include the flag u, which allows non-ASCII characters.

1

Just add the stitch to your pattern:

$site = preg_replace("/[^a-zA-Z0-9\.]/", "", $_POST['site']);

Is that \. at the end of the pattern. A \ is to escape the point (.), because in regular expressions the . means any character

  • Opa, it worked here but I do not understand these expressions at all but there are some sites that have the trait too - so I think it would not work very well.. will be able to allow the use of it too?

  • Yes, just add the - in the end. He doesn’t need to be escaped. In fact, inside brackets there is no need to escape anything, nor the point, but I like to do this, it is a habit. Your new expression would be: $site = preg_replace("/[^a-zA-Z0-9\.\-]/", "", $_POST['site']);

  • Thanks, it worked out!

  • @Viniciusaquino follows a reference about regular expressions, so you can understand them better :) http://www.devmedia.com.br/expressoes-regulares-em-php/25076

  • @leo_ap how much is inside brackets the point is not a meta character, so your regex is escaping the backslash as well. The correct is'/[^a-zA-Z0-9.]/'

  • @gmsantos in no way, if putting the backslash causes an error in the regular expression, to make her match should be used in any situation, inside or outside of brackets. The \. inside brackets correctly escapes the point.

  • @leo_ap http://regexr.com/3f2mn http://stackoverflow.com/a/15139411/2099835

  • [.] and [.] will match only the point. to match a backslash and a point would be needed: [ . ] (or also [ .]). Understand that the backslash escapes the character followed by it in any situation.

  • If you use a simple string in PHP (with single quotes) the backslash is considered. One thing is how language escapes characters, another thing is the regular expression itself.

  • @gmsantos I am referring exactly to the behavior of the regular expression itself, what I am saying is how it was elaborated and conceived by its creators and by the vast majority of its implementers. Using a backslash alone within square brackets will cause error in most regular expression implementations. This is because inside brackets the backslash is considered a metacharacter.

Show 5 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.