To avoid this use everything in the same character set, preferably UTF-8.
When I say everything I mean
- Encoding of the . php, . js, . css, . html files and what else might have text.
- The HTML header in the META tags
- The coding of the Database
Eventually it may happen to have to work with more than one encoding due to different backgrounds such as databases, files like EXCEL spreadsheets (which only work well with ISO-8859-1), etc.
For these cases use display functions like this
function toUTF8($string)
{
if (function_exists('mb_detect_encoding')) {
$current_encoding = mb_detect_encoding($string, 'UTF-8, ASCII, ISO-8859-1');
$string = mb_convert_encoding($string, 'UTF-8', $current_encoding);
} else {
$string = utf8_decode(utf8_encode($string)) == $string ? utf8_encode($string) : $string;
}
return $string;
}
function toLatin1($string)
{
if (function_exists('mb_detect_encoding')) {
$current_encoding = mb_detect_encoding($string, 'UTF-8, ASCII, ISO-8859-1');
$string = mb_convert_encoding($string, 'ISO-8859-1', $current_encoding);
} else {
$string = utf8_encode(utf8_decode($string)) == $string ? utf8_decode($string): $string;
}
return $string;
}
In some situations, even these functions do not solve. This is the case of strings concatenated with more than one encoding (believe me, this is not so unusual) and for these cases the conversion must be done character by character.
Where does it happen? Have some file, print and etc..?
– KaduAmaral
@Kaduamaral, take this question and you will understand: http://answall.com/questions/91549/json-encode-returningmalformed-utf-8-characters-possibly-incorrectly-encoded
– Wallace Maxters
Be the answer to this question and read the related article, which you will understand. link
– Guilherme Lautert
this business of using 30 functions to return something with correct encoding could only be the work of PHP
– Wallace Maxters
What is the Database of this external file?
– rray
@rray, that’s the problem. Any site that put there, I have to pick up the content. One hour can be utf-8, another time may not be.
– Wallace Maxters
So I guess you detect the encoding and do the checks, http://us3.php.net/manualen/function.iconv-get-encoding.php
– rray
As explained in the answer I Linkei, the problem may be incompatibility, the page you are accessing should be
ISO-8859-1
and PHP must be inUTF-8
, thus generating character incompatibility.– Guilherme Lautert
@rray, the problem is that some page I am not with utf-8 configured correctly. hence the function
mb_detect_encoding
ALWAYS returns utf-8. Bah!– Wallace Maxters
What is the code used to pick up this external page, has how you put in your question ?
– Cezar
It probably has windows-1251 (Ios-8895-1) characters mixed with Unicode, I recommend this answer (which you may already know): http://answall.com/a/43205/3635 -- will only be a problem if the answer comes from a WS, so you will have to deal with
iconv
for example..– Guilherme Nascimento