Which encoding to choose for a database?

Asked

Viewed 5,461 times

24

When we create a new base (whether it is in Mysql, Postgresql, Oracle, Sqlserver or other) we can choose which one encoding of that bank, for example, UTF-8 or Latin-1.

  1. Is there any recommendation or does this choice make no difference?

  2. In the case of encodings where a character may occupy more than one byte, like UTF-8, if I define a column like varchar(5), I can store in it, for example, 5 special characters (example: àèáéú)?

  3. The Mysql Workbench wizard offers different variations of some encodings such as Latin-1 and UTF-8. What is the difference between these variations?

inserir a descrição da imagem aqui inserir a descrição da imagem aqui

  • 1

    UTF-8 for being virtually a global standard facilitates choice.

  • 3

    utf8_general_ci because it’s almost bulletproof in special characters. I see no reason to use any other type of encoding except in some extremely exceptional case. Run from latin.

  • 1

    The ones on the list are collations, not encodings. They should be chosen first of all, according to the encoding, and between those of the same encoding, which corresponds to the criterion you want, in the language in which the application is used.

1 answer

16


The choice of charset your database will depend on the application that will use the same.

The UTF-8 is a pattern that supports beyond the Latin characters, Greek characters, Hebrew characters, among others, thus being a chartset that supports multiple languages.

  1. If your application needs to be multi-language compatible, using UTF-8 ensures that characters are displayed correctly to users, no matter what language they use. Several open source projects (Wordpress, Drupal, phpBB) use UTF-8 as standard on account of that.

  2. In Mysql a varchar(5) stores up to five characters, no matter the size on bytes. For international characters, SQL Server assumes a different field type for data: nchar and nvarchar. For these types of data strings should be delimited with N''

  3. The different collations are used in the form in which ordination and comparison will be executed and may vary according to region (alphabetical order considering accents for example). The _ci represents whether the Collation is Case Insensitive (case or case variation) in the comparison. More implementation details on Mysql can be found here.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.