What character encoding (Collation) should I use in Mysql?

Asked

Viewed 48,265 times

26

What is the most appropriate character encoding (Collation) for a Mysql database that will store Portuguese language data?

1 answer

41


Both serve: latin1_swedish_ci or utf8_general_ci.

To change the CHARSET and COLLATION of an existing bank:

ALTER DATABASE `sua_base` CHARSET = Latin1 COLLATE = latin1_swedish_ci;

or

ALTER DATABASE `sua_base` CHARSET = UTF8 COLLATE = utf8_general_ci;

Explanation

CHARSET and COLLATE are different things, in Mysql, each CHARSET has Collates, each with its own particularity.

  • latin1_general_ci: There is no distinction between upper and lower case letters. Searching for "test", records such as "Test" or "TEST" will be returned.
  • latin1_general_cs: Distinguishes upper and lower case letters. Searching for "test" will only return "test". Options such as "Test" and "TEST" will not be returned.
  • latin1_swedish_ci: It does not distinguish lowercase and uppercase letters or accented characters with cedilla, that is, the record containing the word "Intuition" will be returned when there is a search for the word "intúicao"

Source


(edited in 2019)

The universal standard is UTF-8, even more in Brazil, where it is "standard of fact and by law".

Thus the first option (with distinction) is utf8_swedish_ci,

ALTER DATABASE `sua_base` CHARSET = Latin1 COLLATE = utf8_swedish_ci;

and the second (without distinction) utf8_general_ci.

  • 1

    Is there any reason to prefer Latin1 instead of UTF-8? I would strongly recommend using the Unicode standard, even AP only foreseeing Portuguese language characters (facilitates interoperability later), provided of course it meets what was requested (by the way: there is a utf8_swedish_ci).

  • Really, both serve, I prefer too UTF-8

Browser other questions tagged

You are not signed in. Login or sign up in order to post.