The main difference is in how the utf8_general_ci
and utf8_unicode_ci
make comparisons similar to some phonemes.
For example, in the German language the character "ß" would be equivalent to "ss". Like utf8_unicode_ci
has to make this kind of comparison by matching more than one character, it is slower that utf8_general_ci
.
That is, if your application doesn’t need character comparisons in multiple languages, go to utf8_general_ci
.
But considering systems that work globally and should work with multiple languages, such as a Wordpress or Wikimedia for example, use utf8_unicode_ci
is a good way out.
Another interesting chartset to mention is the utf8_bin
. It is based on bit-by-bit comparison of characters, resulting in a comparison case-sensitive, unlike the other collations.
Completion
The choice of collation depends very much on the nature of our application. Beyond the uft8
, there are other Charsets to meet needs of a specific region (latin1
for example) and as each scope varies greatly, I do not believe it is possible to point out the most appropriate for all cases.
In most cases, the utf8_general_ci
will meet, therefore, as its name suggests, it is for general use and most common to be found. However, it is interesting to know that there are other collations that can meet a more specific need, such as the utf8_unicode_ci
and utf8_bin
.
Source: Mysql documentationen