Domain names may currently be Unicode (utf8). Depending on the rules of your business model, if you need to allow domains that have non-ASCII characters, the following routine may be useful:
function validate_domain_name($str, $force_utf8 = true)
{
$force_utf8 = $force_utf8? 'u': '';
//Isso é ineficiente.
//$re = '[^a-zA-Z0-9\.]';
//Isso é ineficiente. Pois não valida normas básicas
//$re = '^(http[s]?\:\/\/)?((\w+)\.)?(([\w-]+)?)(\.[\w-]+){1,2}$';
//Esse é mais consistente
$re = '^(?!\-)(?:[\w\d\-]{0,62}[\w\d]\.){1,126}(?!\d+)[\w\d]{1,63}$';
if (preg_match('/'.$re.'/'.$force_utf8, $str, $rs) && isset($rs[0]) && !empty($rs[0])) {
return $rs[0];
} else {
return null;
}
}
$str = '000-.com';
$str = '-000.com';
$str = '000.com'; // válido
$str = 'foo-.com';
$str = '-foo.com';
$str = 'foo.com'; // válido
$str = 'foo.any'; // válido
$str = 'お名前0.com'; // válido
$str = 'お名前.コム'; // válido
echo 'domain: '.validate_domain_name($str);
To disable Unicode, set the second parameter to boolean false
.
The original regular expression has been adapted from this response: https://stackoverflow.com/a/16491074/1685571
The adaptations I made were changing a-zA-Z
for \w
and add the option to include the flag u
, which allows non-ASCII characters.
As well only the use of the point?
– LocalHost
I want to clear the string, take out all the special characters, allowing only the use of the stitch, for example when I use this function and type in the form www.lostscavenge.com.br my string looks like this: wwwlostscavengecombr Which was supposed to look like this: www.lostscavenge.com.br
– viniciussvl