Format city names and ignore words like "do", "dos", "das", "da", etc

Question

Format city names and ignore words like "do", "dos", "das", "da", etc

Asked 8 years, 6 months ago

Viewed 1,239 times

17

I’m working with Webservice whose city names are all out of shape and I’d like to create a function to treat the names evenly. An example is this:

CHICKEN HARBOR

I’d like to keep it that way:

Porto de Galinhas

I’d have to blow up in the string treat all variables at once by putting everything in lowercase and right after giving a ucfirst in each word making exceptions of ucfirst in predefined words such as of, of, of, of ...

I know the process but do not know how to run.

I’ve rehearsed something:

$string = "PORTO DE GALINHAS";
$array = explode(' ', $string);

foreach ($array as $ar) {
    $dados = strtolower($ar);
    $dados .= "&nbsp;";

    // Imprime porto de galinhas
    $cidade = trim($dados);
}

4 answers

20

A sketch that can be easily adapted:

function properCase( $string ) {    
   $ignorar = array( 'do', 'dos', 'da', 'das', 'de' );
   $array = explode(' ', strtolower( $string ) );
   $out = '';
   foreach ($array as $ar) {
      $out .= ( in_array ( $ar, $ignorar ) ? $ar : ucfirst( $ar ) ).' ';
   }
   return trim( $out );
}

echo properCase( 'PORTO DE GALINHAS' ).PHP_EOL;

See working on IDEONE.

Important: if using with UTF-8, remember to use mb_convert_case in place of duties ucwords and strtolower, so the accentuated letters don’t get marry wrong.

Here’s a question with answers that can help a lot of people :)

– Marcos Vinicius

2015/12/27 at 21:37
2

@Marcosvinicius this is good when registering things in DB, just have to be careful in some situations. For example, Rua XV de Novembro will be a problem. But for most cases, it works. If one thinks a little adapts easy easy logic to take care of the Romans (and put everything in upper).

– Bacco

2015/12/27 at 21:40
Okay, so not what I meant is to take the strtolower from within the foreach, http://ideone.com/6zHiGe

– Guilherme Nascimento

2015/12/27 at 21:50
@Guilhermenascimento in the end, since it was to optimize appealed :D

– Bacco

2015/12/27 at 21:55
Worth a +100 :D

– Guilherme Nascimento

2015/12/27 at 21:56
properCase is a name that only philosophers give their functions ;)

– Wallace Maxters

2015/12/28 at 16:51
@Wallacemaxters hope this isn’t bad :P

– Bacco

2015/12/28 at 17:09
No, it’s not bad. I like to throw the names in English too, rsrsrsrs

– Wallace Maxters

2015/12/28 at 17:19
But if the user puts more than one space?

– Luhhh

2016/11/01 at 23:14
@Luhhh Has more than one space on the way outUé (but it works the same way :) . The question does not ask to change the spaces. The code preserves as it was typed.

– Bacco

2016/11/01 at 23:16
@Luhhh if you want to remove double spaces, there are several ways. can use replace, explode with array_filter, preg_replace, depends on what is convenient for your specific code.

– Bacco

2016/11/01 at 23:22

Show 6 more comments

Browser other questions tagged php string variables function

You are not signed in. Login or sign up in order to post.

by Maniero • **444,682** points · Answer 1 · 2015-12-27T21:52:38+00:00

I would do so:

str_replace(array("De ", "Do ", "Dos ", "Da ", "Das "),
     array("de ", "do ", "dos ", "da ", "das "), ucwords(strtolower("PORTO DE GALINHAS")));

Behold working in the ideone. And in the repl it.. Also put on the Github for future reference.

Documentation of ucwords().

Documentation of str_replace().

You can improve the way you treat these things. This is a simplistic way of trying, but it’s what the question asks.

Seeing the scattered comments I will make the version of the function that can customize the exceptions:

function capitalize($string, $search = array("De ", "Do ", "Dos ", "Da ", "Das "), $replace = array("de ", "do ", "dos ", "da ", "das ")) {
    return str_replace($search, $replace, ucwords(strtolower($string)));
}

Behold working in the ideone. And in the repl it.. Also put on the Github for future reference.

by Guilherme Nascimento • **98,651** points · Answer 2 · 2015-12-27T21:26:49+00:00

As an alternative, I would do so:

<?php
function formatarString($str, $glue = ' ')
{
    //torna minúscula e divide string por espaços, tabs e outros do meta-caractere \s (remove espaços desnecessários)
    $palavras = preg_split('#\s+#', strtolower($str));

    //Lista de palavras ignoradas
    $ignoreList = array('de', 'as', 'do', 'dos', 'da', 'das');

    foreach ($palavras as &$palavra) {
        if (in_array($palavra, $ignoreList) === false) {
            $palavra = ucfirst($palavra);
        }
    }

    return implode($glue, $palavras);
}

For HTML you can use so:

echo formatarString('PORTO DE GALINHAS', '&nbsp;'); //Saída: Porto&nbsp;de&nbsp;Galinhas

But this is optional, the second parameter is what you will have between words, by default uses spaces

echo formatarString('PORTO DE GALINHAS'); //Saída: Porto de Galinhas

Note: to use accents use the mb_convert_case($dados, MB_CASE_TITLE), this will need to be active in your php.ini and/or installed via repository (for example apt in Debian), so it would look like this:
$palavras = preg_split('#\s+#', $str);

//Lista de palavras ignoradas
$ignoreList = array('de', 'as', 'do', 'dos', 'da', 'das');

foreach ($palavras as &$palavra) {
    if (in_array($palavra, $ignoreList) === false) {
        $palavra = mb_convert_case($palavra, MB_CASE_TITLE);
    }
}
Also note that you do not set the charset of the strings, it uses by default the value returned from mb_internal_encoding(), then if you want to use other codec setting before using the function, so:
mb_internal_encoding('UTF-8'); //Se for usar utf-8

echo formatarString('DECORAÇÃO DE AMBIENTES');

Examples of use (online test on repl.it: https://repl.it/@inphinit/de-do-da-com-ucwords):

$string = "PORTO DE GALINHAS";

echo "Original: $string\n";

$string = formatarString($string, ' ');

echo "Ajustado: $string\n";

echo "----------------\n";

$string = "DEMAIS AFAZERES";

echo "Original: $string\n";

$string = formatarString($string, ' ');

echo "Ajustado: $string\n";

echo "----------------\n";

$string = "OS DEMAIS AFAZERES";

echo "Original: $string\n";

$string = formatarString($string, ' ');

echo "Ajustado: $string\n";

Exit:

Original: PORTO DE GALINHAS
Ajustado: Porto de Galinhas
----------------
Original: DEMAIS AFAZERES
Ajustado: Demais Afazeres
----------------
Original: OS DEMAIS AFAZERES
Ajustado: Os Demais Afazeres

by Emmanuel de Carvalho Garcia • 61 points · Answer 3 · 2020-04-24T11:03:40+00:00

I like to use regular expressions for this because of the simplicity that it provides. The function preg_replace() provides what we need using regular expression.

function nomeProprio($input) {
    return preg_replace('/\sd(\ws?)\s/i', ' d$1 ' , mb_convert_case($input, MB_CASE_TITLE, "UTF-8"));
}

$input = 'PORTO DE GALINHAS'; // texto vindo do webservice

echo nomeProprio($input);

the first thing that will happen in the CHICKEN PORT string is that it will go through the mb_convert_case() function that has the MB_CASE_TITLE and UTF-8 parameter that are used to convert the String into Case Title and the charset into UTF-8 respectively. Resulting in:

"Porto De Galinhas"

regular expression will be used in this string

Dissecando a expressão regular: /\sd(\ws?)\s/i

/ // delimitador de início da expressão regular
\s // exige que haja um espaço
d // exige que haja a letra d
( // inicia um grupo de captura para ser utilizado na sequência -> $1
\w // que exige que tenha uma letra(\w), também poderia ser [aeiou]
s? // seguido ou não da letra s
) // finaliza o grupo de captura
\s // mais uma vez exige que haja outro espaço
/ // delimitador de fim da expressão regular
i // faz com que a expressão seja case insensitive considerando o d ou D

that you will find " From ", you will extract only the letter "and" and place it in the capture group $1 causing

" De "

is replaced by

" de "

displaying as final result:

"Porto de Galinhas"

how w can be any letter would find as well

" Da ", " Di ", " Do ", " Du ", " D(qualquer letra) "...

in the above case only the vowels enter the capture group. And also:

" Das ", " Dis ", " Dos ", " Dus ", " D(qualquer letra)s "...

in this other case enter the vowels accompanied by the letter s that will be placed in the place of $1 resulting in:

" das ", " dis ", " dos ", " dus ", " d(qualquer letra)s "...

Note that the substitution will only be in the " Of " part because the rest is already correct with the use of the mb_convert_case function.