How to determine the sort of accented characters in Mysql?

Asked

Viewed 2,527 times

7

My table uses "utf8-general-ci" so all accents are OK. But when I have for example Aa, Ac and Áb and I want to filter "in alphabetical order", the result is:

  • Aa
  • Ac
  • Áb

About Excel, the result is:

  • Aa
  • Áb
  • Ac

which is more logical.

How do I get the same result using Mysql? I tried to change the collation to latin_general_ci or others, but changes nothing...

  • I think that <a href="http://imasters.com.br/artigo/1203/postgresql/sort-of-accented/">Link</a> meets your needs

3 answers

4

Apparently the problem occurs specifically with the latin1_general_ci (according to this link):

  • latin1_general_ci: There is no distinction between upper and lower case letters. Searching for teste, records as Teste or TESTE will be returned.

  • latin1_swedish_ci: Does not distinguish lower-case and upper-case letters or accented characters with cedilla, that is, the record containing the word Intuição will be returned when there is a search for the word intúicao.

And it is because of the distinction that this different ordering occurs.

An example:

CREATE TABLE IF NOT EXISTS `products` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(45) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 COLLATE latin1_general_ci;

INSERT INTO `products` (`name`) VALUES ('Aa'), ('Ac'), ('Áb');

If I use the following query:

SELECT * FROM products ORDER by name ASC;

Will return this:

+----+------+
| ID | NAME |
+----+------+
| 1  | Aa   |
| 2  | Ac   |
| 3  | Áb   |
+----+------+

Online: http://sqlfiddle.com/#! 2/a7131/1

If I use with utf8_general_ci (or utf8_unicode_ci):

CREATE TABLE IF NOT EXISTS `products` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(45) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 COLLATE latin1_swedish_ci;

Will return this:

+----+------+
| ID | NAME |
+----+------+
| 1  | Aa   |
| 3  | Áb   |
| 2  | Ac   |
+----+------+

Note: With utf8_unicode_ci, utf8_general_ci and latin1_swedish_ci I had the expected results, only the latin1_general_ci showed such behavior.

Another problem that may occur is with changes in COLLATE at the time of export and import (in a possible backup restore) the data has been converted to another format (it happens a lot because of software like Phpmyadmin).

How to backup without coding problems

There are several methods to backup, but in my view the most practical (if it is an online server mainly) is to use SSH with mysqldump

On the terminal (on the server via SSH) you must use the following command to export a table:

mysqldump banco_de_dados tabela > <caminho completo>/table_name.sql

Note: If you want to download via FTP instead of table_name.sql type a path accessible by FTP to be able to download later (it is very useful for a backups routine)

Import a table:

mysql -u <usuario> -p banco_de_dados
mysql> tabela <caminho completo>/table_name.sql

Possible solutions

Not recommend trying to set the COLLATE in query, because this would be more a "gambiarra" and probably if you forget to add to some query may occur differences in the results.

I recommend recreating the tables using a COLLATE that of support to consider characters with accents equal to those without accents, you can use the latin1_swedish_ci or utf8*

Read this by chance come use the utf8:

Note: According to the answer to the question [ Which UTF-8 "collate" is the most suitable for Web (multi-language) ] the utf8_general_ci considers "equal" accents to letters without accents, but not all characters are considered equal, such as ß will only have the same result as ss use utf8_unicode_ci (read in the same answer about advantages and disadvantages), which is a "problem" similar to that of the latin1_general_ci

4

Well, I’m going to ask you a question because from William Nascimento’s answer I found the answer. The problem is not a COLLATE problem but a problem between Mysql and PHP. We know that PHP does not use utf8. It was intended for PHP6 but will 'theoretically" happen only in PHP7. Let’s then see how to make a site in utf8 to understand the difficulty. At the beginning, I will create my BDD and then the tables. Everything in utf8. When I will test, using the Guilherme Nascimento test, it is all right. Legal!

Then I will create my PHP code and create the HTML page where I will put:

   <meta content="text/html; charset=UTF-8" http-equiv="content-type" />

Then I will create a php document, where I will put for example:

   define ("TITULO","Direção");

I will save in UTF8 and send using FTP software, which will not change the code.

I will create a page with a form and an INPUT field using the title defined in my define. Cool! I will see "Direction" as title, I will type for example "national direction" in the field. In Submit, the content will be sent in the second PHP page and I will save using a SELECT. When reading, I will simply do a SELECT * FROM TAB. I will do an echo by adding the title and then the contents of my table and I will have:

    Direção: direção nacional

Right now, I’m going to make sure it’s all right. But it’s not. Actually, within the table, I don’t have "national direction". I have "National Right". But as in reading, it has automatic data conversion, it gives the illusion that the data is correct. The problem is that when I will need to do an ORDER BY, Mysql will do using "dire the national" and the result will be wrong.

In his example Guilherme Nascimento uses a fiddle then, a closed system. Which explains that works perfectly.

Hiccup

The solution is simple. Immediately after mysqli_connect, you need to put mysqli_set_charset.

   $handle = mysqli_connect($sql_host,$sql_user,$sql_password,$sql_database);
   mysqli_set_charset($handle,'utf8');

From this, when you will type "direction", the table will have "direction" and the ORDER BY will be bandstand.

But for the old data???

Unfortunately, exporting to re-import will change nothing. Because I will export "dire o nacional" and reimport "dire o nacional". Actually, you need to read the data WITHOUT doing mysqli_set_charset, then do mysqli_set_charset and do an INSERT.

So:

    1 - Conectar usando mysqli_connect (sem fazer o mysqli_set_charset)
    2 - Ler os dados da tabela e salvar para preparar o query para INSERT elas
    3 - Fazer um TRUNCATE na tabela
    4 - Fazer o mysqli_set_charset($handle,'utf8');
    5 - Fazer o INSERT dos dados

So I’m going to read the "old" way and then I’m going to read the "new" way.

Now it’s all right!!

  • 1

    @Guilhermenascimento For me the key point of this answer is that it is necessary to define the charset of the connection between PHP and the database (with set_charset or a SET names ...). Precisely because PHP does not use UTF8. If you define $foo = 'acentuação', the representation that PHP makes of the string is not in UTF8. That’s why there are functions of Encode/Decode in the language.

  • @bfavaretto I read the question again more carefully, realmenet the user’s own response answers the problem and after your comment understood, PHP with UTF-8 refers to the Mysql API and not to PHP necessarily. Thank you! + 1 Reply, +1 for your comment.

0

You can override the collation standard by other through the clause COLLATE.
For example:

SELECT * FROM tabela ORDER BY campo COLLATE utf8_general_ci;

Also, note if the field is also with the same collation table.

  • The table is already in utf8_general_ci so the COLLATE does nothing and even with a "second" COLLATE, the result and the same.

  • Yeah. I thought for some unknown reason the collation of the table was being ignored.

  • What is the engine of your table?

  • One more thing, try changing the field collation as well.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.