Format city names and ignore words like "do", "dos", "das", "da", etc

Asked

Viewed 1,239 times

17

I’m working with Webservice whose city names are all out of shape and I’d like to create a function to treat the names evenly. An example is this:

CHICKEN HARBOR

I’d like to keep it that way:

Porto de Galinhas

I’d have to blow up in the string treat all variables at once by putting everything in lowercase and right after giving a ucfirst in each word making exceptions of ucfirst in predefined words such as of, of, of, of ...

I know the process but do not know how to run.

I’ve rehearsed something:

$string = "PORTO DE GALINHAS";
$array = explode(' ', $string);

foreach ($array as $ar) {
    $dados = strtolower($ar);
    $dados .= " ";

    // Imprime porto de galinhas
    $cidade = trim($dados);
}

4 answers

20


A sketch that can be easily adapted:

function properCase( $string ) {    
   $ignorar = array( 'do', 'dos', 'da', 'das', 'de' );
   $array = explode(' ', strtolower( $string ) );
   $out = '';
   foreach ($array as $ar) {
      $out .= ( in_array ( $ar, $ignorar ) ? $ar : ucfirst( $ar ) ).' ';
   }
   return trim( $out );
}

echo properCase( 'PORTO DE GALINHAS' ).PHP_EOL;

See working on IDEONE.

Important: if using with UTF-8, remember to use mb_convert_case in place of duties ucwords and strtolower, so the accentuated letters don’t get marry wrong.

  • Here’s a question with answers that can help a lot of people :)

  • 2

    @Marcosvinicius this is good when registering things in DB, just have to be careful in some situations. For example, Rua XV de Novembro will be a problem. But for most cases, it works. If one thinks a little adapts easy easy logic to take care of the Romans (and put everything in upper).

  • Okay, so not what I meant is to take the strtolower from within the foreach, http://ideone.com/6zHiGe

  • @Guilhermenascimento in the end, since it was to optimize appealed :D

  • Worth a +100 :D

  • properCase is a name that only philosophers give their functions ;)

  • @Wallacemaxters hope this isn’t bad :P

  • No, it’s not bad. I like to throw the names in English too, rsrsrsrs

  • But if the user puts more than one space?

  • @Luhhh Has more than one space on the way outUé (but it works the same way :) . The question does not ask to change the spaces. The code preserves as it was typed.

  • @Luhhh if you want to remove double spaces, there are several ways. can use replace, explode with array_filter, preg_replace, depends on what is convenient for your specific code.

Show 6 more comments

12

I would do so:

str_replace(array("De ", "Do ", "Dos ", "Da ", "Das "),
     array("de ", "do ", "dos ", "da ", "das "), ucwords(strtolower("PORTO DE GALINHAS")));

Behold working in the ideone. And in the repl it.. Also put on the Github for future reference.

Documentation of ucwords().

Documentation of str_replace().

You can improve the way you treat these things. This is a simplistic way of trying, but it’s what the question asks.

Seeing the scattered comments I will make the version of the function that can customize the exceptions:

function capitalize($string, $search = array("De ", "Do ", "Dos ", "Da ", "Das "), $replace = array("de ", "do ", "dos ", "da ", "das ")) {
    return str_replace($search, $replace, ucwords(strtolower($string)));
}

Behold working in the ideone. And in the repl it.. Also put on the Github for future reference.

  • Ah cool, good example +1

  • This function may have a problem ... and if among the items in the array $search have a string OF? In the function demonstrated, will not treat ... That’s why I think you do the opposite ... check if there is a "word" in any of its forms "uppercase, lowercase, mixed" and gives a strtolower in it. At least I think.

  • 1

    And why would I have a DO in the $search, Just don’t put it in. The function is flexible, if you use it one way, it will give a result, if you use another, it will give another result. If you don’t want that other one, don’t use it this way. The only way to avoid this is to take away the flexibility. I think it only makes the function worse,.

  • mb_convert_case would not be better on account of utf-8?

  • 2

    It would be, but not everyone needs UTF-8. Then it would be Overkill.

10

As an alternative, I would do so:

<?php
function formatarString($str, $glue = ' ')
{
    //torna minúscula e divide string por espaços, tabs e outros do meta-caractere \s (remove espaços desnecessários)
    $palavras = preg_split('#\s+#', strtolower($str));

    //Lista de palavras ignoradas
    $ignoreList = array('de', 'as', 'do', 'dos', 'da', 'das');

    foreach ($palavras as &$palavra) {
        if (in_array($palavra, $ignoreList) === false) {
            $palavra = ucfirst($palavra);
        }
    }

    return implode($glue, $palavras);
}

For HTML you can use so:

echo formatarString('PORTO DE GALINHAS', '&nbsp;'); //Saída: Porto&nbsp;de&nbsp;Galinhas

But this is optional, the second parameter is what you will have between words, by default uses spaces

echo formatarString('PORTO DE GALINHAS'); //Saída: Porto de Galinhas

Note: to use accents use the mb_convert_case($dados, MB_CASE_TITLE), this will need to be active in your php.ini and/or installed via repository (for example apt in Debian), so it would look like this:

$palavras = preg_split('#\s+#', $str);

//Lista de palavras ignoradas
$ignoreList = array('de', 'as', 'do', 'dos', 'da', 'das');

foreach ($palavras as &$palavra) {
    if (in_array($palavra, $ignoreList) === false) {
        $palavra = mb_convert_case($palavra, MB_CASE_TITLE);
    }
}

Also note that you do not set the charset of the strings, it uses by default the value returned from mb_internal_encoding(), then if you want to use other codec setting before using the function, so:

mb_internal_encoding('UTF-8'); //Se for usar utf-8

echo formatarString('DECORAÇÃO DE AMBIENTES');

Examples of use (online test on repl.it: https://repl.it/@inphinit/de-do-da-com-ucwords):

$string = "PORTO DE GALINHAS";

echo "Original: $string\n";

$string = formatarString($string, ' ');

echo "Ajustado: $string\n";

echo "----------------\n";

$string = "DEMAIS AFAZERES";

echo "Original: $string\n";

$string = formatarString($string, ' ');

echo "Ajustado: $string\n";

echo "----------------\n";

$string = "OS DEMAIS AFAZERES";

echo "Original: $string\n";

$string = formatarString($string, ' ');

echo "Ajustado: $string\n";

Exit:

Original: PORTO DE GALINHAS
Ajustado: Porto de Galinhas
----------------
Original: DEMAIS AFAZERES
Ajustado: Demais Afazeres
----------------
Original: OS DEMAIS AFAZERES
Ajustado: Os Demais Afazeres
  • It’s dead end @Guilherme Nascimento ...

  • As I said I needed a role, probably the right answer will be William’s but Bacco’s is also great.

  • Yes, I know ... this adaptation I can do. Thank you.

  • @Marcosvinicius I just edit the code, for better performance if I use with various processes :)

-3

I like to use regular expressions for this because of the simplicity that it provides. The function preg_replace() provides what we need using regular expression.

function nomeProprio($input) {
    return preg_replace('/\sd(\ws?)\s/i', ' d$1 ' , mb_convert_case($input, MB_CASE_TITLE, "UTF-8"));
}

$input = 'PORTO DE GALINHAS'; // texto vindo do webservice

echo nomeProprio($input);

the first thing that will happen in the CHICKEN PORT string is that it will go through the mb_convert_case() function that has the MB_CASE_TITLE and UTF-8 parameter that are used to convert the String into Case Title and the charset into UTF-8 respectively. Resulting in:

"Porto De Galinhas"

regular expression will be used in this string

Dissecando a expressão regular: /\sd(\ws?)\s/i

/ // delimitador de início da expressão regular
\s // exige que haja um espaço
d // exige que haja a letra d
( // inicia um grupo de captura para ser utilizado na sequência -> $1
\w // que exige que tenha uma letra(\w), também poderia ser [aeiou]
s? // seguido ou não da letra s
) // finaliza o grupo de captura
\s // mais uma vez exige que haja outro espaço
/ // delimitador de fim da expressão regular
i // faz com que a expressão seja case insensitive considerando o d ou D

that you will find " From ", you will extract only the letter "and" and place it in the capture group $1 causing

" De "

is replaced by

" de "

displaying as final result:

"Porto de Galinhas"

how w can be any letter would find as well

" Da ", " Di ", " Do ", " Du ", " D(qualquer letra) "...

in the above case only the vowels enter the capture group. And also:

" Das ", " Dis ", " Dos ", " Dus ", " D(qualquer letra)s "...

in this other case enter the vowels accompanied by the letter s that will be placed in the place of $1 resulting in:

" das ", " dis ", " dos ", " dus ", " d(qualquer letra)s "...

Note that the substitution will only be in the " Of " part because the rest is already correct with the use of the mb_convert_case function.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.