Get the last "word" of a PATH with different URL formats

Asked

Viewed 139 times

1

I am creating a function that gets the last "word" of a requested url in php, without considering parameters and considering the root as index.

Examples:

URL http://www.teste.com.br/ EXPECTATION index

URL www.teste.com.br/ EXPECTATIVA index

URL test.with EXPECTATION index

URL test.com/ EXPECTATION index

URL www.teste.com.br/test EXPECTATION test

URL http://www.teste.com.br/test TEST EXPECTATION

URL http://test.com/test TEST EXPECTATION

URL https://www.teste.com/test TEST EXPECTATION

URL https://test.com/test TEST EXPECTATION

URL test.com/test/two EXPECTATION 2

URL test.com/test/two/ EXPECTATION TWO

URL test.com/test/two/? variable=test EXPECTATION two

URL test.com/test/two? variable=test EXPECTATION TWO

URL test.com/test/two/? variable=test EXPECTATION two

URL test.com/test? var1=t&var 2=t EXPECTATION test

URL test.com/test/tres#ola EXPECTATION

URL test.com/test? var1=t&var 2=t#ola EXPECTATION test

Using the function basename and working with substr and preg_match I get a certain success rate:

$arr = array(
  array("name"=>"http://www.teste.com.br/","possibleValues"=>array("index")),
  array("name"=>"www.teste.com.br/","possibleValues"=>array("index")),
  array("name"=>"teste.com","possibleValues"=>array("index")),
  array("name"=>"teste.com/","possibleValues"=>array("index")),
  array("name"=>"www.teste.com.br/teste","possibleValues"=>array("teste")),
  array("name"=>"http://www.teste.com.br/teste","possibleValues"=>array("teste")),
  array("name"=>"http://teste.com/teste","possibleValues"=>array("teste")),
  array("name"=>"https://www.teste.com/teste","possibleValues"=>array("teste")),
  array("name"=>"https://teste.com/teste","possibleValues"=>array("teste")),
  array("name"=>"teste.com/teste/dois","possibleValues"=>array("dois")),
  array("name"=>"teste.com/teste/dois/","possibleValues"=>array("dois")),
  array("name"=>"teste.com/teste/dois/?variavel=teste","possibleValues"=>array("dois")),
  array("name"=>"teste.com/teste/dois?variavel=teste","possibleValues"=>array("dois")),
  array("name"=>"teste.com/teste/dois/?variavel=teste","possibleValues"=>array("dois")),
  array("name"=>"teste.com/teste?var1=t&var2=t","possibleValues"=>array("teste")),
  array("name"=>"teste.com/teste/tres#ola","possibleValues"=>array("tres")),
  array("name"=>"teste.com/teste?var1=t&var2=t#ola","possibleValues"=>array("teste"))
);

foreach($arr as $value){
  echo "URL ".$value["name"]."\n";
  echo ( array_search( basename( returnLastWord( $value["name"] ) ), $value["possibleValues"] ) === false ? "FALHOU" : "PASSOU" )." -> expected: ".json_encode( $value["possibleValues"] )." get '".basename( returnLastWord( $value["name"] ) )."'\n\n";
}

function returnLastWord($var){
  preg_match('/[?#]/', $var, $matches, PREG_OFFSET_CAPTURE);
  $after = ( empty( $matches[0][1] ) ? NULL : $matches[0][1] );
  if($after){
    return substr($var, 0, $after);
  }else{
    // echo "aqui\n";
    return $var;
  }
}

https://repl.it/JJYI/7


URL http://www.teste.com.br/
FALHOU -> expected: ["index"] get 'www.teste.com.br'

URL www.teste.com.br/
FALHOU -> expected: ["index"] get 'www.teste.com.br'

URL teste.com
FALHOU -> expected: ["index"] get 'teste.com'

URL teste.com/
FALHOU -> expected: ["index"] get 'teste.com'

URL www.teste.com.br/teste
PASSOU -> expected: ["teste"] get 'teste'

URL http://www.teste.com.br/teste
PASSOU -> expected: ["teste"] get 'teste'

URL http://teste.com/teste
PASSOU -> expected: ["teste"] get 'teste'

URL https://www.teste.com/teste
PASSOU -> expected: ["teste"] get 'teste'

URL https://teste.com/teste
PASSOU -> expected: ["teste"] get 'teste'

URL teste.com/teste/dois
PASSOU -> expected: ["dois"] get 'dois'

URL teste.com/teste/dois/
PASSOU -> expected: ["dois"] get 'dois'

URL teste.com/teste/dois/?variavel=teste
PASSOU -> expected: ["dois"] get 'dois'

URL teste.com/teste/dois?variavel=teste
PASSOU -> expected: ["dois"] get 'dois'

URL teste.com/teste/dois/?variavel=teste
PASSOU -> expected: ["dois"] get 'dois'

URL teste.com/teste?var1=t&var2=t
PASSOU -> expected: ["teste"] get 'teste'

URL teste.com/teste/tres#ola
PASSOU -> expected: ["tres"] get 'tres'

URL teste.com/teste?var1=t&var2=t#ola
PASSOU -> expected: ["teste"] get 'teste'

I’m having trouble mainly in the first 4 examples, where theoretically it would be the root of the project, ie should get the index

3 answers

1

I couldn’t find a simple way to do it, especially with variations of input URL types, what I got was this:

function returnLastWord($var){

    //Remove o protocolo
    $var = preg_replace('~^[^:]+[:][/]{2,}~', '', $var);

    /*
    Pega qualquer coisa que seja um PATH em URLs
    pega o que esta entre o parenteses neste exemplo:
    `site.com/(foo/bar/baz)?querystring=ignorada#hashignorada`
    */
    if (preg_match('~/([^#?]{1,})~', $var, $matches)) {

        //Remove o / do final em urls como `foo/bar/`, para evitar pegar em branco
        $result = rtrim($matches[1], '/');

        //Pega qualquer coisa que estiver no final
        if (preg_match('~[^/]+$~', $result, $matches)) {
          return $matches[0];
        }
    }

    //Se qualquer coisa anterior falhou é porque provavelmente é "index"
    return 'index';
}

Example in Ideone

It is likely that I will review this to make it more performative or simple.

1

PHP already has a native function to work with URL, but as not all are in the format defined in RFC-3986, the function ends up analyzing incorrectly those that are not standardized. Nothing critical. What happens is that the function considers what should be host as part of path, then a check if there is the character . in the path is therefore necessary, if any, the element in question is the host, not the path, returning, like this, index.

function returnLastWord ($url) {

    // Analisa a URL:
    $url = parse_url($url);

    // Divide o path nas ocorrências de /:
    $parts = explode('/', trim($url["path"], '/'));

    // Busca o último elemento:
    $last = end($parts);

    // Se não estiver vazio e não possuir o caractere ., retorna o valor, senão retorna index:
    return $last && false === strpos($last, '.') ? $last : "index";

}

See working on Ideone.

1

Using the function parse_url() can facilitate the work.

<?php

$arr = array(
    'http://localhost/',
    'https://localhost/',
    'http://localhost',
    'https://localhost',
    'http://sub.localhost/',
    'http://sub.localhost',
    'http://localhost/foo',
    'http://localhost/foo/bar',
    'http://localhost/?p=1&b=1',
    'localhost/foo',
    'localhost'
);

echo '<table border=1>
<tr><td>URL</td><td>Word</td></tr>';
foreach ($arr as $v) {
    $url_original = null;
    // Normalizing the given URL
    if (preg_match('#^https?://#i', $v) !== 1) {
        $url_original = $v;
        $v = 'http://'.$v;
    }

    $url = parse_url($v);
    echo PHP_EOL.'<tr><td>'.$v.(!empty($url_original)? '<br>('.$url_original.')': '').'</td><td>';
    if (
        isset($url['path'])
        && !empty($url['path'])
        && $url['path'] != '/'
    ) {
        // Path found
        $p = strrpos($url['path'], '/');
        if ($p !== false) {
            echo substr($url['path'], $p+1);
        }
    } else {
        // Empty, no path
        echo 'index';
    }
    echo '</td></tr>';
}
echo '</table>';

Normalization
In order to work even with Urls without schema (http or https), a normalization is made where "http://" is prefixed to string before moving to function parse_url().

Multibyte characters
The section that invokes the function substr() may fail when the URL has multibyte characters. If you want to provide multibyte character support, refer to functions mbstring.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.