Detect if link is from the current or external domain with PHP

Asked

Viewed 530 times

2

I’m currently using this expression with preg_replace to detect links in a content sent via POST:

$conteudo = $_POST["conteudo"];

$conteudo = preg_replace('!(\s|^)((https?://|www\.)+[a-z0-9_./?=&-]+)!i', ' <a class="link_externo" href="$2" target="_blank">$2</a>', $conteudo);

echo $conteudo;

I wonder if the link that is sent via POST is from my site, or if it’s an external link. If it’s from my site, it should contain the link_interno, and, if external, then use a class link_externo and add the target="_blank".

I would also like the expression to accept some symbols, because if I send a link for example: www.site.com.br/teste!apenasumteste or www.site.com.br/teste#apenasumteste it will only detect the link until before the ! or #.

How can I do that?

1 answer

4


One way to get the current domain is to get the value of the key SERVER_NAME of array $_SERVER.

SERVER_NAME: The name host of the server where the script is executed. If the script is running in a host virtual, this will be the value set to that host virtual.

Note: $_SERVER is a array containing information such as headers, paths, and locations of script... There is no guarantee that each web server will provide some of these; servers may omit some, or provide others [..].

To extract information from a link, for example, the host, use the function parse_url, and in a function you check whether the host extracted is equivalent or not to your site:

function verificarLink($link, $dominio) {
  $info = parse_url($link);
  $host = isset($info['host']) ? $info['host'] : "";
  return ((!empty($host) && strcasecmp($host, $dominio) == 0) ? true : false);
}

To perform the check, do so:

$link = "http://www.site.com.br/teste!apenasumteste";
$dominio = $_SERVER['SERVER_NAME'];

if (verificarLink($link, $dominio)) {
    echo "Domínio interno!";
} else {
    echo "Domínio externo!";
}

Updating

In accordance with is response from Soen, use the regular expression below to extract links http/https:

(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))

This regular expression can be used in the function preg_match_all to extract all the links of a string:

function extrairLinks($conteudo){
    $expressao = "%(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))%";
    preg_match_all($expressao, $conteudo, $resultados);
    $links = array_filter(array_map('array_filter', $resultados))[0]; // Remover capturas vazias
    return $links;
}

And to use it do:

$dominio = $_SERVER['SERVER_NAME'];
$links = extrairLinks($conteudo);

foreach($links as $link){   
    if (verificarLink($link, $dominio)) {
        echo '<a class="link_interno" href="'. $link .'" target="_blank">'. $link .'</a>' . "<br>";
    } else {
        echo '<a class="link_externo" href="'. $link .'" target="_blank">'. $link .'</a>' . "<br>";
    } 
}
  • It turns out, the content sent will not always be a link. It was just a simple example I gave, the content may vary between having links or not, so from preg_replace. So I can’t use your answer...

  • may come with http, https or only the www.

  • @Igor Got it. Content can come with more than one link?

  • Yes. @qmechanik

  • No, because I need it to be with preg_match...

  • @Igor Yes, I had updated the answer by placing the function extrairLinks who uses the preg_match_all, try a test and see if you can extract the(s) link(s) with that function.

  • @Igor I updated again, now the example puts the class internal/external depending on the link. If possible give the feedbeck about.

Show 2 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.