What is a collation?

Asked

Viewed 1,183 times

11

I never knew exactly what it meant. I know it has to do with the coding of the data that will be entered in the table, but in addition, there is some more specific reason to select a collation?

I have some doubts:

  • When it comes to databases, what would be specifically a Collation?

  • Depending on the Colattion chosen, I can increase the speed of queries?

  • Is there any specific recommendation when using? Example: Type of data saved, region or character set?

  • What problems can I have if I don’t know which Collation use when creating a table or database?

1 answer

4

The collation ollation defines the set of rules that the server will use for sorting and comparing texts, that is, how the operators =, >, <, order by, etc. For example, depending on the configured Collation, the system will sort the character 'ö' between the characters ːo' and ːp', using another collation, this character can be ordered at another position. So it can give conflict when making queries that relate tables with different collations. In addition, the collation also defines whether the system will differentiate accented characters or whether it will be case sensitive, for example the collation Latin1_general_ci_as defines that the system should treat characters as case insensitive (CI) and acentue sensitive (AS). Examples:

latin1_general_ci: There is no distinction between upper and lower case letters. Searching for "test", records such as "Test" or "TEST" will be returned. latin1_general_cs: Distinguishes upper and lower case letters. Searching for "test" will only return "test". Options such as "Test" and "TEST" will not be returned. latin1_swedish_ci: Does not distinguish lowercase and uppercase letters or accented characters with cedilla, that is, the record containing the word "Intuition" will be returned when there is a search for the word "intúicao" Even if you change the collation of the database the previously created objects will not have the collation changed, for that you will have to recreate the object.

In relation to performance, due to the fact that each collation has its rules in the treatment of strings, one can have big variations of performance depending on the size of the table to be consulted.

That article cites an example where the difference in performance was 10x between two identical tables with different collations.

Which collation to use will depend on which languages you will support, for example if you are working with Latin-based languages (European, Portuguese, etc.) Oce can use Latin1_general, which basically corresponds to the ASCII character set. To work with a larger charset, like Unicode, the most important thing is to use data types that support this charset, such as nchar, nvarchar.

Choosing a wrong collation can lead to problems like:

  • Writing or reading wrong characters
  • Performance impaired in consultations
  • Errors in JOIN queries or text comparison

Browser other questions tagged

You are not signed in. Login or sign up in order to post.