Character override function does not work when data comes from mysql

Asked

Viewed 402 times

1

I use a function to replace characters with accents or special characters, but when using the same function with data from MySQL the function is not replacing the characters.

Assuming the city is Foz do Iguaçu, the function would return: Foz do Iguacu, Hence, the c would be replaced by c.

In the Mysql database structure the city is:

type = varchar (80)
Collation = latin1_general_ci

$cidade=removeAcentos($row['cli_cidade'])

function removeAcentos ($string){
    // REMOVENDO ACENTOS
    $tr = strtr($string,
        array (
          'À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä' => 'A', 'Å' => 'A',
          'Æ' => 'A', 'Ç' => 'C', 'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E',
          'Ì' => 'I', 'Í' => 'I', 'Î' => 'I', 'Ï' => 'I', 'Ð' => 'D', 'Ñ' => 'N',
          'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O', 'Ö' => 'O', 'Ø' => 'O',
          'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'U', 'Ý' => 'Y', 'Ŕ' => 'R',
          'Þ' => 's', 'ß' => 'B', 'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a',
          'ä' => 'a', 'å' => 'a', 'æ' => 'a', 'ç' => 'c', 'è' => 'e', 'é' => 'e',
          'ê' => 'e', 'ë' => 'e', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i',
          'ð' => 'o', 'ñ' => 'n', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o',
          'ö' => 'o', 'ø' => 'o', 'ù' => 'u', 'ú' => 'u', 'û' => 'u', 'ý' => 'y',
          'þ' => 'b', 'ÿ' => 'y', 'ŕ' => 'r', 'º' => '', 'ª' => ''
        )
    );

    return $tr;
}
  • Have you tried $cidade=removeAcentos(utf8_encode($row['cli_cidade']))?

  • It didn’t work, I tried to force ((string) ($Row['cli_cidade'])) but it didn’t work either. @Kaduamaral

1 answer

1

fmoreira@saucer UnmergedCode $ echo '<?= strlen("Á") ?>' | php
2

The problem is that accented characters in UTF-8 occupy two (or more!) characters; strtr opera over bytes, not over characters.

You can use str_replace (although you will have to separate your vector in two), or if you manage to install the extension intl PHP (you will need to php.ini and connect to php_intl.dll; I tried here on my Mac but could not), you can use the normalizer_normalize.

Internally, a call to normalizer_normalize('bênção', Normalizer::FORM_D) converts a string type bênção in something like be^nc¸a~o, breaking the letters "accented" in the original letter in the respective penduricalhos. Hence you can use a regular expression type [^a-zA-Z] to detonate everything that is NOT letter.

(You’ll still need to do str_replace for "letters" of type 'ª'.)


I noticed that you are replacing ːþ' by ːb', but a better phonetic transcription, despite the visual similarity, it is % th'. If you expect to have to process these weird characters, I find it more robust that you use some variant of unidecode, a library that also converts, e.g. "北 亰" to "Bei Jing".

Browser other questions tagged

You are not signed in. Login or sign up in order to post.