Which UTF-8 "collate" is the most suitable for Web (multi-language)

Asked

Viewed 9,649 times

22

I usually use utf8_general_ci by default in my projects, but recently I found that other developers use utf8_unicode_ci

  • utf8_general_ci: Unicode (multi-language), Case/Case Insensitive
  • utf8_unicode_ci: Unicode (multi-language), Case/Case Insensitive

Which of these would be the most appropriate for Web or is there any other utf-8 more suitable for web?

1 answer

20


The main difference is in how the utf8_general_ci and utf8_unicode_ci make comparisons similar to some phonemes.

For example, in the German language the character "ß" would be equivalent to "ss". Like utf8_unicode_ci has to make this kind of comparison by matching more than one character, it is slower that utf8_general_ci.

That is, if your application doesn’t need character comparisons in multiple languages, go to utf8_general_ci.

But considering systems that work globally and should work with multiple languages, such as a Wordpress or Wikimedia for example, use utf8_unicode_ci is a good way out.

Another interesting chartset to mention is the utf8_bin. It is based on bit-by-bit comparison of characters, resulting in a comparison case-sensitive, unlike the other collations.

Completion

The choice of collation depends very much on the nature of our application. Beyond the uft8, there are other Charsets to meet needs of a specific region (latin1 for example) and as each scope varies greatly, I do not believe it is possible to point out the most appropriate for all cases.

In most cases, the utf8_general_ci will meet, therefore, as its name suggests, it is for general use and most common to be found. However, it is interesting to know that there are other collations that can meet a more specific need, such as the utf8_unicode_ci and utf8_bin.

Source: Mysql documentationen

Browser other questions tagged

You are not signed in. Login or sign up in order to post.