When you do a query like this:
select campo from tabela where campo REGEXP '[0-9]{2}/[0-9]{2}/[0-9]{4}';
You are bringing the field (in case, all the value 30/09/2018 e 14/10/2018 e 21/10/2018
), provided it satisfies regex (that is, provided it has numbers in the given format). That’s why the query brings "all".
To extract only part of the field, use REGEXP_SUBSTR
(available from Mysql 8). Assuming the table is:
create table tabela (
campo varchar(200)
);
insert into tabela(campo) values('30/09/2018 e 14/10/2018 e 21/10/2018');
insert into tabela(campo) values('blablabla e 14/10/2018 e 21/10/2018');
insert into tabela(campo) values('nao tenho nenhuma data');
We can do:
select REGEXP_SUBSTR(campo, '[0-9]{2}/[0-9]{2}/[0-9]{4}') from tabela;
And the result will be:
30/09/2018
14/10/2018
null
The third field returns null
, because the field value does not match regex. You can change the query to only return the desired chunk when the field has a date in the desired format:
select REGEXP_SUBSTR(campo, '[0-9]{2}/[0-9]{2}/[0-9]{4}')
from tabela where campo REGEXP '[0-9]{2}/[0-9]{2}/[0-9]{4}';
Now the where
checks if the field has any date, and the REGEXP_SUBSTR
extracts only the desired section. The result is:
30/09/2018
14/10/2018
If you want the second occurrence, just pass 2 additional parameters: the initial position and the occurrence:
--- traz a segunda ocorrencia
select REGEXP_SUBSTR(campo, '[0-9]{2}/[0-9]{2}/[0-9]{4}', 1, 2)
from tabela where campo REGEXP '[0-9]{2}/[0-9]{2}/[0-9]{4}';
In this case, the initial position is 1
, and the occurrence is 2
(ie, I will get the second date present in the field). The result is:
14/10/2018
21/10/2018
See this example running on DB Fiddle.
But there’s a catch: if you have something like 112/10/2019
, the regex will take the stretch 12/10/2019
. To avoid these cases, you can change the regex to:
(^|[^0-9])[0-9]{2}/[0-9]{2}/[0-9]{4}([^0-9]|$)
At the beginning there is the stretch (^|[^0-9])
: the |
means "or". Therefore, this passage means "string start" (^
) or "anything other than a number" (the [^
means that I don’t want what comes after, I mean, I don’t want 0-9
- no digit from zero to 9).
And in the end there’s ([^0-9]|$)
: any character other than a number or the end of the string ($
). That guarantees you won’t take cases like 112/10/2019
. See here the difference.
In other languages and Engines it is possible to use \b
instead of (^|[^0-9])
and ([^0-9]|$)
, but Mysql does not support this.
Before Mysql 8 there was no REGEXP_SUBSTR
, then it is a bit more complicated. If the date is at the very beginning of the string, you can use the solution proposed by Luiz Fernando. But if the date is at any position of the string, I find it easier to extract the data using some programming language and manipulate it there (because languages usually have more regex resources than Mysql, which is well-limited).
Dates x Regex
Just an addendum on the use of regex to check dates. A regex [0-9]{2}/[0-9]{2}/[0-9]{4}
accepts values that are not valid dates, such as 99/99/9999
and 00/00/0000
, see here an example. That’s because [0-9]
means "any digit of 0
to 9
".
It is possible to improve a little using something like:
(0[1-9]|[12][0-9]|3[01])/(0[1-9]|1[012])/(19|20)[0-9]{2}
This ensures that the days can only be from 01 to 31, the months 01 to 12, and the years 1900 to 2099.
But still, this regex accepts dates like 31/04/2019 (and April only has 30 days) and 29/02/2019 (February 29 in a non-leap year), see. It was unclear how data is entered into your database, but it may be that a typo occurs, or simply an invalid date appears in the middle of the string.
It is even possible to have a regex that checks these conditions. In this question from Soen there are some examples, but they are so complicated that it’s not worth it. Look at one of them, just to get an idea:
^(?:(?:31(\/|-|\.)(?:0?[13578]|1[02]))\1|(?:(?:29|30)(\/|-|\.)(?:0?[1,3-9]|1[0-2])\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:29(\/|-|\.)0?2\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:(?:0?[1-9])|(?:1[0-2]))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$
Do you understand? Neither do I.
Remembering that \d
is not supported by Mysql, so you should exchange them for [0-9]
. Even so, in my opinion it is not worth using this regex. It is a problem to understand and to maintain in the future (good luck to those who have to change it and add new cases, for example).
You need to find a balance. Using the giant regex above ensures that no invalid dates will be returned, but it is a maintenance nightmare. Using the suggested expressions at the beginning of this answer, you get snippets that seem like a date (may be right in most cases, but will also bring some invalid dates, something you should evaluate if it is acceptable). Once these snippets are extracted, an additional validation could be done later, preferably using some date API (i.e., extract this data from the database and validate it using your favorite programming language).
Regex is a powerful tool, but is not a magic solution that solves all cases.
https://regex101.com/r/Gqd385/1
– Valdeir Psr
only remove the
/g
. Take out the modifierg
.– Marcelo Shiniti Uchimura
There are ways to do it, but the answer depends on the language you are using. Click [Edit] and add information, and the code you used
– hkotsubo
@hkotsubo edited.
– Rod
@Perfect valdeirpsr, only that in MYSQL it did not bring records. REGEXP ' ( d{2}/ d{2}/ d{4})'; You know why?
– Rod
@Rod Porque o Mysql does not support syntax
\d
, only[0-9]
. See how important it is to put all the details in the question from the beginning? :-)– hkotsubo