=, BINARY, LIKE, LIKE BINARY, REGEXP, SQL binary grouping

Asked

Viewed 314 times

14

Reformulation of the question based on @Gabrielhenrique’s answer and research:

What is binary grouping in a string column in the table?

BINARY serves to make a case-sensitive search and LIKE case-insensitive, then LIKE BINARY would be a case-sensitive comparison with the use of % and _ to complete the text with any value?

There’s a difference and what’s more performative, BINARY or just =? Is this difference significant in large tables? And in small tables?

Which is more performative, REGEXP or LIKE? Is this difference significant in large tables? And in small tables?

Which is more performative, REGEXP or LIKE BINARY? Is this difference significant in large tables? And in small tables?

2 answers

9

In Mysql, BINARY has the function of forcing an exact comparison, ie a case-sensitive comparison, byte to byte.

I believe that the use of this grouping is for search optimization when looking for a specific value.

6


According to Mysql documentation (string comparation), comparisons of strings are not case-sensitive, unless one of the operators be it case-sensitive, or the collation country.

Therefore, use = or LIKE may also be a search case-sensitive depending on the collation, so for example:

 LIKE _latin1 'ABC%' COLLATE latin1_bin

Using LIKE BINARY forces a binary comparison independent of collation, ensuring comparison case-sensitive, although it may impair performance depending on table size.

About the use of REGEX vs LIKE, some considerations should be made:
REGEX does not use an index, even if present, which will impact on performance;
LIKE uses an index, if any, as long as it does not start the comparison by % or '_';

Here’s an excellent response in the English OS about this: https://stackoverflow.com/a/8431675/4730201

EDIT: by suggestion I am copying an interesting part of the answer from the link above:

SELECT * FROM t WHERE a = 'abc'      <<-- (case insensitive ) pode usar o índice se existir
SELECT * FROM t WHERE a LIKE 'abc'   <<-- (case insensitive como "=") pode usar o índice se existir
SELECT * FROM t WHERE a LIKE 'abc%'  <<-- pode usar o índice se existir
SELECT * FROM t WHERE a LIKE 'a%'    <<-- pode usar o índice se existir, dependendo da cardinalidade (1)
SELECT * FROM t WHERE a LIKE '%a%'   <<-- não vai usar um índice
SELECT * FROM t WHERE a LIKE '_agf'  <<-- não vai usar um índice

(1) According to the author of the answer, depending on the cardinality Mysql may or may not use an index: If more than +/- 20% of the lines match the criteria, Mysql will not use an index, because in this case, do a full search "full table scan" table is faster

Therefore, from the performance point of view, it is interesting to consider the presence of an index in the column, the collation and the research that will be done (initiated or not by % or _) to choose the best form of comparison.

Depending also on the frequent field comparison, starting with % and the volume of data, another option to be considered is the use of an index fulltext, that is suitable to optimize this type of research. More information (in English) here: Mysql Fulltext Index

EDIT: clarifying a doubt on the COLLATION: In the above example, from the Mysql documentation, it is specifying the collation you want to use in the comparison, regardless of the collation original.
As an example, a table with a field that was created like this texto VARCHAR(50) CHARACTER SET latin1 COLLATE latin1_german1_ci, ie the name column is with the collation in German, but you want to compare using Spanish, you could do so: WHERE texto LIKE _latin1 'ABC%' COLLATE latin1_spanish_ci

I put an example here: sqlfiddle

  • Could explain the parts of this SQL: LIKE _latin1 'ABC%' COLLATE latin1_bin? If you can add the most relevant parts to the question with an example (in Portuguese) of the links you have put, thank you

  • Hello @Guilhermecostamilam, in this example of the MySQL, he is specifying the collation who wants to compare, regardless of the collation original. Imagine you have a table with a field that was created like this texto VARCHAR(50) CHARACTER SET latin1 COLLATE latin1_german1_ci, or is the column name is with the collation in German, but you want to compare using Spanish, you could do so: where texto LIKE _latin1 'ABC%' COLLATE latin1_spanish_ci got it?

  • Good answer and example, just missing BINARY vs = and maybe develop a little more on the indices. Thank you

  • good, tomorrow I will edit with some information, thanks for the tips

Browser other questions tagged

You are not signed in. Login or sign up in order to post.