Regex - How to recover the first occurrence of two uppercase letters after a comma in Mysql?

Asked

Viewed 133 times

1

I have the following text::

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris sodales porta tempor. Praesent eleifend volutpat nulla, BA at aliquet libero. Vivamus est eros, convallis vitae risus vel, malesuada feugiat felis. Nunc at pharetra nulla. Nam ornare feugiat lectus, sed tincidunt est auctor eget.

I want to retrieve the two letters BA which can appear at any position in the text. They will always be uppercase letters (BA, RJ, DF, SP, ...). The default will be A-Z.

So that’s what I’ve got so far but it’s not working:

SELECT artigo from artigos WHERE artigo REGEXP ', [A-Z]{2}';
  • Missed the WHERE in the query: SELECT artigo from artigos WHERE artigo REGEXP ', [A-Z]{2}'. Or was typing error when posting the question?

  • It was a typo. The query returns the entire article. I wanted it to return, in the example only 'BA'.

1 answer

1


From Mysql 8, you can use the function REGEXP_SUBSTR, which returns only the passage corresponding to the regular expression.

However, the regex ', [A-Z]{2}' will also return the comma and the space. To return only the letters, just use SUBSTR:

SELECT SUBSTR(REGEXP_SUBSTR(artigo, ', [A-Z]{2}', 1, 1, 'c'), 2)
from artigos WHERE artigo REGEXP ', [A-Z]{2}';

With that, only the letters BA are returned by query.

The parameters 1, 1, 'c' are respectively the position where the search starts (1 is the beginning of the string), which occurrence will be returned (1, because we want the first occurrence of regex in the string) and 'c' makes the search be case sensitive *.

See working on DB Fiddle.


* At first a regex [A-Z] should already pick only capital letters, but testing on DB Fiddle he was doing the search case insensitive and ended up picking up the letters co of the stretch , consectetur. Passing the option 'c' the search became case sensitive and the BA was found correctly. Probably this happens because the default mysql is to use the charset rules and collation search parameters, and DB Fiddle must use one whose default is case insensitive. To learn more, see here and here.

For versions prior to Mysql 8, the function REGEXP_SUBSTR is not available. In this case, maybe it is possible to simulate the result (or get close to it) with substrings, using something like this. Or simply take the entire string and extract the desired chunk using a programming language - as these usually have better regex support than Mysql.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.