Detect if link is from the current or external domain with PHP

Question

Detect if link is from the current or external domain with PHP

Asked 10 years, 3 months ago

Viewed 530 times

2

I’m currently using this expression with preg_replace to detect links in a content sent via POST:

$conteudo = $_POST["conteudo"];

$conteudo = preg_replace('!(\s|^)((https?://|www\.)+[a-z0-9_./?=&-]+)!i', ' <a class="link_externo" href="$2" target="_blank">$2</a>', $conteudo);

echo $conteudo;

I wonder if the link that is sent via POST is from my site, or if it’s an external link. If it’s from my site, it should contain the link_interno, and, if external, then use a class link_externo and add the target="_blank".

I would also like the expression to accept some symbols, because if I send a link for example: www.site.com.br/teste!apenasumteste or www.site.com.br/teste#apenasumteste it will only detect the link until before the ! or #.

How can I do that?

1 answer

Browser other questions tagged php preg-replace

You are not signed in. Login or sign up in order to post.

by stderr • **30,356** points · Answer 1 · 2015-05-10T04:12:08+00:00

One way to get the current domain is to get the value of the key SERVER_NAME of array $_SERVER.

SERVER_NAME: The name host of the server where the script is executed. If the script is running in a host virtual, this will be the value set to that host virtual.

Note: $_SERVER is a array containing information such as headers, paths, and locations of script... There is no guarantee that each web server will provide some of these; servers may omit some, or provide others [..].

To extract information from a link, for example, the host, use the function parse_url, and in a function you check whether the host extracted is equivalent or not to your site:

function verificarLink($link, $dominio) {
  $info = parse_url($link);
  $host = isset($info['host']) ? $info['host'] : "";
  return ((!empty($host) && strcasecmp($host, $dominio) == 0) ? true : false);
}

To perform the check, do so:

$link = "http://www.site.com.br/teste!apenasumteste";
$dominio = $_SERVER['SERVER_NAME'];

if (verificarLink($link, $dominio)) {
    echo "Domínio interno!";
} else {
    echo "Domínio externo!";
}

Updating

In accordance with is response from Soen, use the regular expression below to extract links http/https:

(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))

This regular expression can be used in the function preg_match_all to extract all the links of a string:

function extrairLinks($conteudo){
    $expressao = "%(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))%";
    preg_match_all($expressao, $conteudo, $resultados);
    $links = array_filter(array_map('array_filter', $resultados))[0]; // Remover capturas vazias
    return $links;
}

And to use it do:

$dominio = $_SERVER['SERVER_NAME'];
$links = extrairLinks($conteudo);

foreach($links as $link){   
    if (verificarLink($link, $dominio)) {
        echo '<a class="link_interno" href="'. $link .'" target="_blank">'. $link .'</a>' . "<br>";
    } else {
        echo '<a class="link_externo" href="'. $link .'" target="_blank">'. $link .'</a>' . "<br>";
    } 
}