Why does the php string sometimes return instead of a few accented letters?

Asked

Viewed 396 times

2

I asked a question about coding problem and json_encode of PHP.

In order not to generate a very broad question, I decided to ask this question separately.

Because sometimes PHP returns the character in the middle of a string, containing accented characters?

Example:

My name Wallace

The strange thing is that sometimes the same letter can be replaced by , and the same seems in some places of the printed string.

Example:

Meu nome � Wallace e estou com fé terei minhas dúvidas resolvidas

Note that the é appears in the word , but she alone doesn’t seem.

Why does this happen?

What generates the character?

  • Where does it happen? Have some file, print and etc..?

  • @Kaduamaral, take this question and you will understand: http://answall.com/questions/91549/json-encode-returningmalformed-utf-8-characters-possibly-incorrectly-encoded

  • Be the answer to this question and read the related article, which you will understand. link

  • 2

    this business of using 30 functions to return something with correct encoding could only be the work of PHP

  • What is the Database of this external file?

  • @rray, that’s the problem. Any site that put there, I have to pick up the content. One hour can be utf-8, another time may not be.

  • So I guess you detect the encoding and do the checks, http://us3.php.net/manualen/function.iconv-get-encoding.php

  • As explained in the answer I Linkei, the problem may be incompatibility, the page you are accessing should be ISO-8859-1 and PHP must be in UTF-8, thus generating character incompatibility.

  • @rray, the problem is that some page I am not with utf-8 configured correctly. hence the function mb_detect_encoding ALWAYS returns utf-8. Bah!

  • What is the code used to pick up this external page, has how you put in your question ?

  • It probably has windows-1251 (Ios-8895-1) characters mixed with Unicode, I recommend this answer (which you may already know): http://answall.com/a/43205/3635 -- will only be a problem if the answer comes from a WS, so you will have to deal with iconv for example..

Show 6 more comments

3 answers

1

This is due to the set of characters your web page is configured to, which is diverging between another set. We currently have the ISO-8859-1 and UTF-8 character sets as the most used, and in PHP we recommend always using UTF-8 in the encoding of your scripts.

Give to change the character set using the following command in PHP:

<?php
//Sempre coloque esse comando no início do seu script, depois da tag de abertura dele.
header('Content-Type: text/html; charset=utf-8');

Or also on the html page, using a meta tag, like this:

<meta http-equiv="content-type" content="text/html;charset=utf-8" />

In html 5 you can use this way:

<meta charset="utf-8">

On the html page it would look like this:

<!doctype html>
<html>
    <head>
        <title>Seu título da página</title>
        <meta http-equiv="content-type" content="text/html;charset=utf-8" />
        <!- Ou assim em html 5 -->
        <meta charset="utf-8">
    </head>
    <body>
        Conteúdo
    </body>
</html>

1

To avoid this use everything in the same character set, preferably UTF-8.

When I say everything I mean

  • Encoding of the . php, . js, . css, . html files and what else might have text.
  • The HTML header in the META tags
  • The coding of the Database

Eventually it may happen to have to work with more than one encoding due to different backgrounds such as databases, files like EXCEL spreadsheets (which only work well with ISO-8859-1), etc.

For these cases use display functions like this

function toUTF8($string)
{
    if (function_exists('mb_detect_encoding')) {
        $current_encoding = mb_detect_encoding($string, 'UTF-8, ASCII, ISO-8859-1');
        $string = mb_convert_encoding($string, 'UTF-8', $current_encoding);         
    } else {
        $string =  utf8_decode(utf8_encode($string)) == $string ? utf8_encode($string) : $string;           
    }
    return $string;
}

function toLatin1($string)
{
    if (function_exists('mb_detect_encoding')) {
        $current_encoding = mb_detect_encoding($string, 'UTF-8, ASCII, ISO-8859-1');
        $string = mb_convert_encoding($string, 'ISO-8859-1', $current_encoding);            
    } else {
        $string = utf8_encode(utf8_decode($string)) == $string ? utf8_decode($string): $string;     
    }
    return $string;
}

In some situations, even these functions do not solve. This is the case of strings concatenated with more than one encoding (believe me, this is not so unusual) and for these cases the conversion must be done character by character.

0

This occurs mainly in the return of data from your database, as its content may be as iso and uft8 data, or vice versa.

Try using the following function to convert the data to utf8:

echo mb_convert_encoding($variable,"UTF-8","auto");

Remembering that you should add the following meta to your html:

< meta charset="UTF-8">

  • In this case it is not the database. It is the contents of an Xternal page. I can’t solve this problem even with a stick!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.