How to extract a word from a PHP URL

Asked

Viewed 1,232 times

4

In these examples below:

+bbbbbbb2.virtua.com.br - take out the Virtua
+000-74-4-000.paemt702.dsl.Brasiltelecom.net.br take out Brasiltelecom
+111.222.22.222.dynamic. adsl.gvt. net.br - take the gvt

I’ve tried to:

$texto = "189-72-5-240.paemt702.dsl.brasiltelecom.net.br";  
echo substr($texto,-10);   

But then it counts the word and is missing depending on the size of the host that writes to the BD.

  • 2

    You always want to take the domain name out, is that it? Do you have a fixed list of hosts? You have a list of TLD scopes, that is, it’s always .net.br?

3 answers

4


If you want the penultimate chunk of the URL:

This solution works with all the examples given in the question:

$pedacos = explode('.',$texto);
echo $pedacos[count($pedacos)-2];

Entrances:

$texto = "189-72-5-240.paemt702.dsl.brasiltelecom.net.br";  
$texto = "+bbbbbbb2.virtua.com.br";
$texto = "+111.222.22.222.dynamic.adsl.gvt.net.br";

Exits:

brasiltelecom
virtua
gvt


The problem of using fixed position:

If you have addresses with different suffixes in the list, predetermined positions can give problem, as in the following examples:

$texto = "bbbbbbb2.virtua.com.br";
$texto = "www.usp.br";  
$texto = "66-97-12-89.datalink.net";  

Exits:

virtua       Até aqui tudo bem...
www          ... mas neste caso teria que ser "usp"...
66-97-12-89  ... e neste teria que ser datalink !

To solve the problem follows the ...:


Solution for addresses with multiple suffixes:

To resolve what is suffix and what is the domain name itself, you will need a system with an "official" suffix list to query what can and cannot be removed from the end of the URL.

Mozilla provides a list of suffixes in https://publicsuffix.org/.

This function solves the problem well, if suffixes of interest are applied:

function NomeDoDominio( $dominio ) {
    // o array precisa estar ordenado dos maiores para os menores
    $sufixos = array( '.com.br', '.net.br', '.org.br', '.com', '.br' );
    foreach( $sufixos as $sufixo ) {
       if( $sufixo == substr( $dominio , -strlen( $sufixo ) ) ) {
          $dominio = substr( $dominio , 0, -strlen( $sufixo ) );
          break;
       }
    }
    return substr( strrchr( '.'.$dominio, '.'), 1);
}

See working on IDEONE.

Note: in the case of Brazil, for example, an address may be www.jose.silva.nom.br, to further complicate the situation.

  • I think it doesn’t have to be that complex.

  • Which of the three solutions I gave you are saying? (I imagine the 3rd, of course.) I based on what you have in question, maybe if it complements with more details can optimize.

  • The latter, mainly, would be simpler to do the explode for an array. then remove the word he wants. so it gets more generic.

  • I believe he wants to extract, not remove, but in fact it is an assumption. There is a certain ambiguity in the statement.

4

According to your question, whatever the URL is, there is a consistency that is getting the third word cut from the end:

┌─────────────────────────────────────────────────┬───────────────┐
│ Endereço URL                                    │ Valor a obter │
├─────────────────────────────────────────────────┼───────────────┤
│ +bbbbbbb2.virtua.com.br                         │ virtua        │
├─────────────────────────────────────────────────┼───────────────┤
│ +000-74-4-000.paemt702.dsl.brasiltelecom.net.br │ brasiltelecom │
├─────────────────────────────────────────────────┼───────────────┤
│ +111.222.22.222.dynamic. adsl.gvt. net.br       │ gvt           │
└─────────────────────────────────────────────────┴───────────────┘ 

Solution

For that specific purpose you can:

$valor = array_reverse(explode(".", $url))[2];
  1. We’re doing the conversion of string $url for a array leaving the same by the character . using the function explode().
  2. The output is sent to the function array_reverse() that will reverse the array.
  3. Finally we limit the result to index 2 which corresponds to the third position.

Example

In this example for the three Urls that are also on Ideone, we made a function with the code above where it receives the string and the position to be returned:

<?php
function recolher($url="", $pos=2) {
    return array_reverse(explode(".", $url))[$pos];
}

echo recolher("+bbbbbbb2.virtua.com.br").PHP_EOL;                         // virtua

echo recolher("+000-74-4-000.paemt702.dsl.brasiltelecom.net.br").PHP_EOL; // brasiltelecom

echo recolher("+111.222.22.222.dynamic. adsl.gvt. net.br").PHP_EOL;       // gvt
?>

To be even more flexible, we can pass the separation character as a function parameter.

1

Well, come on, that’s a generic function to solve the problem.

$text is the text from which you want to remove the word;

$word is the word that will be removed;

$Pattern is the separator that will be used (In the case of past examples, is the '.')

function remove($texto, $palavra, $pattern){
    $txt = explode($pattern, $texto); //Transformamos em array
    $id = array_search($palavra, $txt); //Buscamos o índice do array que contém aquela palavra 
    unset($txt[$id]); //Removemos o índice
    $texto = implode($pattern, $txt); //Transformamos o array em uma string novamente.
    return $texto;
}

Detail that this function will only remove the FIRST occurrence of the word.

If you have something like +111.222.22.222.Dynamic. gvt.adsl.gvt. net.br. It only removes the first gvt

Link to the documentation:

http://php.net/manual/en/function.explode.php http://php.net/manual/en/function.implode.php http://br2.php.net/manual/en/function.array-search.php

Browser other questions tagged

You are not signed in. Login or sign up in order to post.