Charset iso-8859-1 and utf-8 compatibilization problems

Asked

Viewed 57,155 times

117

  • The 1st Image I use the charset=iso-8859-1

exemplo com charset=iso-8859-1

  • In this 2nd image I use utf8

exemplo com charset=utf-8

I have a news system where you can paste html or text from other pages. On the page where the news are presented I use the charset=iso-8859-1 because of the accentuation but when using this charset the menus and other titles are changed due to the accentuation.

I needed your help to know if it is possible to have two charset goals on the same page or on a part of the page or other way around the situation.

In addition I have two connected files:

global $databases;
$databases = array( 
    'local' => array
    (
            'host'=>'localhost',
            'port'=>3306,
            'dbname'=>'noticiass',
            'user'=>'root',
            'password'=>''
    )
);

mysql.php

Class mysql
{

    public $query;
    public $data;
    public $result;
    public $rows;   
    public $page = 0;
    public $perpage = 10;
    public $current = 1;
    public $url;
    public $link = '';
    public $total = '';
    public $pagination = false;

    protected $config;
    protected $host;
    protected $port;
    protected $user;
    protected $pass;
    protected $dbname;
    protected $con;



    public function __construct()
    {
        try
        {
            #array com dados do banco
            include 'database.conf.php';
            global $databases;
            $this->config = $databases['local'];
            # Recupera os dados de conexao do config
            $this->dbname = $this->config['dbname'];
            $this->host = $this->config['host'];
            $this->port = $this->config['port'];
            $this->user = $this->config['user'];
            $this->pass = $this->config['password'];
            # instancia e retorna objeto
            $this->con = mysql_connect( "$this->host", "$this->user", "$this->pass" );
            mysql_select_db( "$this->dbname" );
            if ( !$this->con )
            {
                throw new Exception( "Falha na conexão MySql com o banco [$this->dbname] em database.conf.php" );
            }
            else
            {
                return $this->con;
            }
            $this->url = $_SERVER['SCRIPT_NAME'];
        }
        catch ( Exception $e )
        {
            echo $e->getMessage();
            exit;
        }
        return $this;
    }

    public function query( $query = '' )
    {
        try
        {
            if ( $query == '' )
            {
                throw new Exception( 'mysql query: A query deve ser informada como parâmetro do método.' );
            }
            else
            {
                $this->query = $query;
                if($this->pagination == true){  
                    $this->result = mysql_query( $this->query );
                    $this->fetchAll();
                    $this->paginateLink();
                    $this->query .= " LIMIT $this->page, $this->perpage";
                    $this->pagination = false;

                }
                $this->result = mysql_query( $this->query );
            }
        }
        catch ( Exception $e )
        {
            echo $e->getMessage();
            exit;
        }
        return $this;
    }

    public function fetchAll()
    {
        $this->data = "";
        $this->rows = 0;
        while ( $row = mysql_fetch_array( $this->result, MYSQL_ASSOC ) )
        {
            $this->data[] = $row;
        }
        if ( isset( $this->data[0] ) )
        {
            $this->rows = count( $this->data );
        }
        return $this->data;
    }

    public function rowCount()
    {
        return @mysql_affected_rows();
    }    

    public function getUrl($perpage)
    {
        $this->url = $_SERVER['REQUEST_URI'];
        return $this;
    }   
    public function paginate($perpage)
    {
        $this->pagination = true;
        $this->perpage = $perpage;
        return $this;
    }
    public function paginateLink()
    {   
        if(!preg_match('/\?/',$this->url))
        {
            $this->url .= "?";
        }else{
            $this->url .= "&";
        }
        if ( isset( $_GET['page'] ) )
        {
            $this->current = $_GET['page'];
            $this->page = $this->perpage * $_GET['page'] - $this->perpage;
            if ( $_GET['page'] == 1 )
            {
                $this->page = 0;
            }
        }
        $this->total = $this->rows;
        if ( $this->rows > $this->perpage )
        {                           
            $this->link = "<div class=\"pagination\"><ul>";
            $prox = "javascript:;";
            $ant = "javascript:;";
            if ( $this->current >= 2 )
            {
                $ant = $this->url."page=" . ($this->current - 1);
            }
            if ( $this->current >= 1 && $this->current < ($this->total / $this->perpage))
            {
                $prox = $this->url."page=" . ($this->current + 1);
            }
            $this->link .= '<li><a href="' . $ant . '">&laquo;</a></li>';
            $from = round( $this->total / $this->perpage );
            if($from == 1){$from++;}

            for ( $i = 1; $i <= $from ; $i++ )
            {
                if ( $this->current == $i )
                {
                    $this->link .= "<li class=\"active\"><a>$i</a></li>\n";
                }
                else
                {
                    $this->link .= "<li><a href=\"".$this->url."page=$i\">$i</a></li>\n";
                }
            }
            $this->link .= '<li><a href="' . $prox . '">&raquo;</a></li>';
            $this->link .= "</ul>\n";
            $this->link .= "</div>\n";
        }   
        return $this;
    }

    public function cut($str,$chars,$info=  '')
    {
        if ( strlen( $str ) >= $chars )
        {
            $str = preg_replace( '/\s\s+/', ' ', $str );
            $str = strip_tags( $str );
            $str = preg_replace( '/\s\s+/', ' ', $str );
            $str = substr( $str, 0, $chars );
            $str = preg_replace( '/\s\s+/', ' ', $str );
            $arr = explode( ' ', $str );
            array_pop( $arr );
            //$arr = preg_replace('/\&nbsp;/i',' ',$arr);
            $final = implode( ' ', $arr ) . $info;
        }
        else
        {
            $final = $str;
        }
        return $final;
    }

}
  • Related article : http://local.joelonsoftware.com/wiki/O_M%C3%Adnimo_absoluto_que_todo_developer_de_software_absolutely,Positivamente_Precisa_Saber_Sobre_Unicode_E_Conjuntos_de_Caracteres%28Sem_Desculpas! %29

  • 2

    Friend, to avoid problems with charset, always set the same charset for the database (collate), for the table fields and the page header. This charset problem is common when the page is in one encoding and the database data in another. Two Charsets on the same page cannot be

  • 1

    The universal standard of charset is utf8, for any language, do not use iso-8859-1, utf8 already interprets accentuation. In any case, you must use everything in the utf8 standard, including in the database, and the file encodes too, must be encoded in utf8, so you will stop suffering from it. And if you have iso-8859-1, convert to utf8... try conversions in this order of preference: utf8_encode() > html_entity_decode() > htmlentities() and ultimately, if none of that works, use the header('content-type text/html charset=utf-8');

  • And for the header of your HTML document, use the meta tag: <meta charset="utf-8">.

  • And before I forget, no doing this in php.ini: php_admin_value default_charset ISO-8859-1

  • Complicated duplicate: see more simple and objective http://answall.com/a/8442/4186 ... No doubt, use only and always UTF-8!

  • 4

    @Peterkrauss what do you mean? What’s complicated about it? It is divided by topics and explained important factors, since connection, header and encoding of documents, just follow one step at a time.

  • 4

    I use UTF-8 where it needs and iso-8859-1 and even win-1252 where it is most convenient, because I really know well of encodings and understand the strengths and weaknesses of each, to the point of knowing that there is no solution that suits all scenarios. Unfortunately there are people who sell themselves for the silver bullet speech and think there is a universal solution. I even understand that when the guy is just a theorist it’s kind of complicated to understand these things, because one has to really study too much, and not everyone does. It is "cheaper" to adopt absolute truths.

  • I’m leaving the Joel Spolsky’s article link updated that Guilherme commented. I found a translation en for any interested party.

Show 4 more comments

3 answers

151

In a quick response: nay it is possible.

By the time the page starts to render, it already assumes an encoding (or tries to detect it if it is not declared, usually uses the default server).

Common problems of codification

It is very common when we are working with accents to come across strange characters such as:

  • Something similar to é (representing the é) or ã (representing ã), this is because the character is Unicode, but the page is in iso-8859-1 (or other compatible).
  • And the sign an example of a situation is when you use an iso-8859-1 compliant accents on a page that is trying to process UTF-8 due to the Content-Type: ...; charset=utf8.

About iso-8859-1

I recommend using iso-8859-1 if your website is "100% in Portuguese" and you do not need "extra encodings" (such as emojis), however even if it is in English you should think about maybe migrating pro utf-8, one of the reasons is that in June 2004, the ISO/IEC development group responsible for its maintenance declared the end of support for this coding, focusing on the UCS and Unicode.

The following will explain how to use each of them, according to the topics:

  • What is required to use UTF-8
  • What is required to use iso-8859-1/latin1/ansi

What is required to use UTF-8

  • PHP scripts (I refer to the files on the server and not the answer on the server) saved in "utf-8 without GOOD"
  • Mysql (or other type of database) with charset=utf-8
  • Preferably define using PHP header('Content-type: text/html; charset=UTF-8');

Note: The advantage of UTF-8 is that you can use multiple "languages" on your page with characters that are not supported by "iso-8859-1".

Source: http://en.wikipedia.org/wiki/ISO/IEC_8859-1

If you decide to use UTF-8 on your website/project, I recommend following the steps:

PHP scripts with UTF-8 without "GOOD"

Note: read about this at http://en.wikipedia.org/wiki/UTF-8#Byte_order_mark (English)

You must save all PHP scripts (up to the ones you will use with include, require, etc) in utf-8 without "GOOD", you can use software like Sublimetext or Notepad++ to convert the files:

  • Using Notepad++:

    utf8 sem bom notepad++

  • Using Sublime Text:

    utf8 sublime sublimetext

  • Using Eclipse go to Window > Preferences > General > Workspace > Text File Encoding:

    netbeans

Note: what files with extension .js or .css that use accents in the content also be saved with the same encoding as the pages, in the same way that was used to save the documents described above, eventually files .svg can then be shipped if they have any accents or different character must also be saved with the same page encoding

Mysql with UTF-8

To create a table in Utff-8 in Mysql you must use something like:

CREATE TABLE minhatabela (
   id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
   titulo varchar(300) DEFAULT NULL
) ENGINE=InnoDB CHARACTER SET=utf8 COLLATE utf8_unicode_ci;

If the tables exist, then first make a BACKUP of them and then use one of the following commands (as needed):

  • Converts database: ALTER DATABASE bancodedados CHARACTER SET utf8 COLLATE utf8_unicode_ci;
  • Converts a specific table: ALTER TABLE minhatabela CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;

In addition to creating the tables in UTF-8 it is necessary to define the connection as UTF-8.

With PDO you need to use exec:

$conn = new PDO('mysql:host=HOST;dbname=BANCO;charset=utf-8', 'USUARIO', 'SENHA');
$conn->exec('SET CHARACTER SET utf8');//Define o charset como UTF-8

With mysqli you need to use mysqli_set_charset:

$mysqli = new mysqli('HOST', 'usuario', 'senha', 'banco');

if ($mysqli->connect_error) {
    printf('Erro de conexão: %s', $mysqli->connect_errno);
    exit;
}

/*
 * compatibilidade para to 5.2.9 e 5.3.0.
 */
if (mysqli_connect_error()) {
    printf('Erro de conexão: %s', mysqli_connect_error());
    exit;
}

if (false === $mysqli->set_charset('utf8')) {
    printf('Error ao usar utf8: %s', $mysqli->error);
    exit;
}

With procedural mysqli:

<?php
$link = mysqli_connect('HOST', 'usuario', 'senha', 'banco');

if (mysqli_connect_error()) {
    printf('Erro de conexão: %s', mysqli_connect_error());
    exit;
}

if (!mysqli_set_charset($link, 'utf8')) {
    printf('Error ao usar utf8: %s', mysqli_error($link));
    exit;
}

Setting the charset of the page

You can use the tag <meta> to set the charset, but it is recommended that you do this in the request response (server response), setting the "headers" (this does not mean that you should not use <meta>).

For this, in PHP use the function header:

The reason it is used in the server response is also because of the page rendering time for the server response and pages AJAX also need the charset defined by header();.

Note: header(); always go at the top of the script before any echo, print, or other type of content display.

In files the answer must be HTML:

<?php
header('Content-Type: text/html; charset=UTF-8');

echo 'Conteudo';

What is required to use iso-8859-1/latin1/ansi

To use iso-8859-1, you must use:

  • PHP scripts saved in "iso-8859-1" (or windows-1252 and ansi)
  • Mysql (or other type of database) with charset=iso-8859-1
  • Preferably define using PHP header('Content-type: text/html; charset=iso-8859-1');

First you must save "all . php scripts" as iso-8859-1 (or ANSI) and . html documents (if any):

  • To save using Notepad++:

    salvando documento em ANSI com notepad++

  • To save using Sublimetext:

    salvando documento em iso-8859-1 ou windows 1252

Mysql com Latin1

Almost all mysql servers come configured to use charset by default latin1, however this can be modified in the my.ini, then the following steps on mysql are optional as they will depend on your my.ini be wearing something different of latin1 on the following lines:

[client]
default-character-set=<charset padrão para o cliente>

[mysql]
default-character-set=<charset padrão para o cliente>

Then you’ll need to define latin1, with PDO it is necessary to use exec:

$conn = new PDO('mysql:host=HOST;dbname=BANCO;charset=utf-8', 'USUARIO', 'SENHA');
$conn->exec('SET CHARACTER SET latin1');//Define o charset como UTF-8

And with mysqli it is necessary to use mysqli_set_charset:

$mysqli = new mysqli('HOST', 'usuario', 'senha', 'banco');

if ($mysqli->connect_error) {
    printf('Erro de conexão: %s', $mysqli->connect_errno);
    exit;
}

/*
 * compatibilidade para to 5.2.9 e 5.3.0.
 */
if (mysqli_connect_error()) {
    printf('Erro de conexão: %s', mysqli_connect_error());
    exit;
}

if (false === $mysqli->set_charset('latin1')) {
    printf('Error ao usar latin1: %s', $mysqli->error);
    exit;
}

With procedural mysqli:

<?php
$link = mysqli_connect('HOST', 'usuario', 'senha', 'banco');

if (mysqli_connect_error()) {
    printf('Erro de conexão: %s', mysqli_connect_error());
    exit;
}

if (!mysqli_set_charset($link, 'latin1')) {
    printf('Error ao usar latin1: %s', mysqli_error($link));
    exit;
}

Setting the charset of the page

Just add to ; charset=iso=8859-1 after the type, for example text/html or application/xml or text/plain:

<?php
header('Content-Type: text/html; charset=iso-8859-1');

echo 'Conteudo';
  • 2

    In mysql it is also necessary to define the Character set of the connection. http://dev.mysql.com/doc/refman/5.7/en/charset-connection.html

  • 3

    +1 I think I’d better use utf8mb4 in place of utf8 in Mysql, to support ALL characters imaginable: https://mathiasbynens.be/notes/mysql-utf8mb4

  • 2

    Just one remark: even though the site is 100% Brazilian I do not recommend using ISO-8859-1, there is no need, only headache, it is that nor taking 3 pins.

  • 1

    @Ivanferrer thanks, but just to emphasize I did not only speak if it is Brazilian I said two things: your site for "100% Brazilian" and if you don’t need "extra encodings", another that I mentioned was that iso has no more support since 2004 (12 years) ;)

  • 6

    I totally disagree with Ivan, ISO-8859-1 is as simple as ASCII, and has nothing of the UTF disadvantages series. It just doesn’t have characters for some languages, emoticons, etc., which are dispensable in most applications. And the fact that it has no support is in the sense that it will not be updated anymore, because it has already stabilized. What is an advantage.

  • 1

    @Guilhermenascimento I closed some 2 or 3 yesterday as a dup of this yours, if you think you should, add some comment on file names too (which depend on OS encoding, and not PHP or Apache). Although I think the guy who uses file accentuation to be used by automated processes already asks for problems naturally. After reading, you can flag my comment as obsolete or warn me, that I delete.

  • For those who use Netbeans the path to code the project is: File -> Project Property -> Sources -> Coding

  • @Guilhermecostamilam ok, I’ll update later with images. Just keep in mind that this is part of the solution only, the connection and the headers should also be worked with caution.

  • my PHP and HTML encoding were already correct failed to just set the encoding in connection with the database, thanks for the help

  • @Guilhermecostamilam great, it was just to reinforce.

  • But because this happened in the view file but in DAO and controller no, the problem only occurred when I traffic the data via Session, if the problem was not set the encoding in the connection the error shouldn’t happen in all the files? If you think I should create a question for that I can edit the mine

  • @Guilhermecostamilam is not only in DAO and controller, as the answer explains, you have to make sure that all files are saved with the same encoding, the connection should use the same encoding of the files as well and so on ... it is likely that you have files in different encodings.

Show 7 more comments

39

Not possible. So, when an HTML page is loaded, the charset used in it is reported:

<meta charset="UTF-8">

Therefore, all content loaded on the page should ideally be loaded on that charset - be it UTF-8, ISO-8859-1 or any other.

On your system, you should provide a way to always store the content, always in the same charset. This involves database format, format of submitted forms - for example, a CMS - and even the format of your website’s source files.

33

An excellent tip for not having problems with charset is to always use the same encoding for:

  1. Apache (httpd.conf or . htaccess file: DefaultCharset UTF-8)
  2. Database ( mysql_set_charset("utf8") )
  3. HTML (<meta charset="UTF-8">)
  4. File System (when saving HTML to disk, check which encoding)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.