According to Mysql documentation (string comparation), comparisons of strings are not case-sensitive, unless one of the operators
be it case-sensitive, or the collation country.
Therefore, use =
or LIKE
may also be a search case-sensitive depending on the collation, so for example:
LIKE _latin1 'ABC%' COLLATE latin1_bin
Using LIKE BINARY
forces a binary comparison independent of collation, ensuring comparison case-sensitive, although it may impair performance depending on table size.
About the use of REGEX
vs LIKE
, some considerations should be made:
REGEX
does not use an index, even if present, which will impact on performance;
LIKE
uses an index, if any, as long as it does not start the comparison by %
or '_'
;
Here’s an excellent response in the English OS about this: https://stackoverflow.com/a/8431675/4730201
EDIT: by suggestion I am copying an interesting part of the answer from the link above:
SELECT * FROM t WHERE a = 'abc' <<-- (case insensitive ) pode usar o índice se existir
SELECT * FROM t WHERE a LIKE 'abc' <<-- (case insensitive como "=") pode usar o índice se existir
SELECT * FROM t WHERE a LIKE 'abc%' <<-- pode usar o índice se existir
SELECT * FROM t WHERE a LIKE 'a%' <<-- pode usar o índice se existir, dependendo da cardinalidade (1)
SELECT * FROM t WHERE a LIKE '%a%' <<-- não vai usar um índice
SELECT * FROM t WHERE a LIKE '_agf' <<-- não vai usar um índice
(1) According to the author of the answer, depending on the cardinality Mysql may or may not use an index:
If more than +/- 20% of the lines match the criteria, Mysql will not use an index, because in this case, do a full search "full table scan" table is faster
Therefore, from the performance point of view, it is interesting to consider the presence of an index in the column, the collation and the research that will be done (initiated or not by %
or _
) to choose the best form of comparison.
Depending also on the frequent field comparison, starting with %
and the volume of data, another option to be considered is the use of an index fulltext, that is suitable to optimize this type of research. More information (in English) here: Mysql Fulltext Index
EDIT: clarifying a doubt on the COLLATION
:
In the above example, from the Mysql documentation, it is specifying the collation you want to use in the comparison, regardless of the collation
original.
As an example, a table with a field that was created like this texto VARCHAR(50) CHARACTER SET latin1 COLLATE latin1_german1_ci
, ie the name column is with the collation in German, but you want to compare using Spanish, you could do so: WHERE texto LIKE _latin1 'ABC%' COLLATE latin1_spanish_ci
I put an example here: sqlfiddle
Could explain the parts of this SQL:
LIKE _latin1 'ABC%' COLLATE latin1_bin
? If you can add the most relevant parts to the question with an example (in Portuguese) of the links you have put, thank you– Costamilam
Hello @Guilhermecostamilam, in this example of the
MySQL
, he is specifying the collation who wants to compare, regardless of the collation original. Imagine you have a table with a field that was created like thistexto VARCHAR(50) CHARACTER SET latin1 COLLATE latin1_german1_ci
, or is the column name is with the collation in German, but you want to compare using Spanish, you could do so:where texto LIKE _latin1 'ABC%' COLLATE latin1_spanish_ci
got it?– Ricardo Pontual
Good answer and example, just missing
BINARY
vs=
and maybe develop a little more on the indices. Thank you– Costamilam
good, tomorrow I will edit with some information, thanks for the tips
– Ricardo Pontual