Bring only the first occurrence of a regex

Asked

Viewed 511 times

1

How to search in a query only the first date?

For example:

30/09/2018 and 14/10/2018 and 21/10/2018

I would like to bring only the first date. I am using so, [0-9]{2}/[0-9]{2}/[0-9]{4}, but it brings everything.

I only want the first date. It has 10 characters. It has some command that I can put in my regex that only accepts to bring the first 10?

I am using Mysql:

 WHERE REGEXP '[0-9]{2}/[0-9]{2}/[0-9]{4}'
  • 1

    https://regex101.com/r/Gqd385/1

  • only remove the /g. Take out the modifier g.

  • 1

    There are ways to do it, but the answer depends on the language you are using. Click [Edit] and add information, and the code you used

  • @hkotsubo edited.

  • @Perfect valdeirpsr, only that in MYSQL it did not bring records. REGEXP ' ( d{2}/ d{2}/ d{4})'; You know why?

  • 1

    @Rod Porque o Mysql does not support syntax \d, only [0-9]. See how important it is to put all the details in the question from the beginning? :-)

Show 1 more comment

2 answers

2

When you do a query like this:

select campo from tabela where campo REGEXP '[0-9]{2}/[0-9]{2}/[0-9]{4}';

You are bringing the field (in case, all the value 30/09/2018 e 14/10/2018 e 21/10/2018), provided it satisfies regex (that is, provided it has numbers in the given format). That’s why the query brings "all".

To extract only part of the field, use REGEXP_SUBSTR (available from Mysql 8). Assuming the table is:

create table tabela (
   campo varchar(200)
);

insert into tabela(campo) values('30/09/2018 e 14/10/2018 e 21/10/2018');
insert into tabela(campo) values('blablabla e 14/10/2018 e 21/10/2018');
insert into tabela(campo) values('nao tenho nenhuma data');

We can do:

select REGEXP_SUBSTR(campo, '[0-9]{2}/[0-9]{2}/[0-9]{4}') from tabela;

And the result will be:

30/09/2018
14/10/2018
null

The third field returns null, because the field value does not match regex. You can change the query to only return the desired chunk when the field has a date in the desired format:

select REGEXP_SUBSTR(campo, '[0-9]{2}/[0-9]{2}/[0-9]{4}')
from tabela where campo REGEXP '[0-9]{2}/[0-9]{2}/[0-9]{4}';

Now the where checks if the field has any date, and the REGEXP_SUBSTR extracts only the desired section. The result is:

30/09/2018
14/10/2018

If you want the second occurrence, just pass 2 additional parameters: the initial position and the occurrence:

--- traz a segunda ocorrencia
select REGEXP_SUBSTR(campo, '[0-9]{2}/[0-9]{2}/[0-9]{4}', 1, 2)
from tabela where campo REGEXP '[0-9]{2}/[0-9]{2}/[0-9]{4}';

In this case, the initial position is 1, and the occurrence is 2 (ie, I will get the second date present in the field). The result is:

14/10/2018
21/10/2018

See this example running on DB Fiddle.


But there’s a catch: if you have something like 112/10/2019, the regex will take the stretch 12/10/2019. To avoid these cases, you can change the regex to:

(^|[^0-9])[0-9]{2}/[0-9]{2}/[0-9]{4}([^0-9]|$)

At the beginning there is the stretch (^|[^0-9]): the | means "or". Therefore, this passage means "string start" (^) or "anything other than a number" (the [^ means that I don’t want what comes after, I mean, I don’t want 0-9 - no digit from zero to 9).

And in the end there’s ([^0-9]|$): any character other than a number or the end of the string ($). That guarantees you won’t take cases like 112/10/2019. See here the difference.

In other languages and Engines it is possible to use \b instead of (^|[^0-9]) and ([^0-9]|$), but Mysql does not support this.


Before Mysql 8 there was no REGEXP_SUBSTR, then it is a bit more complicated. If the date is at the very beginning of the string, you can use the solution proposed by Luiz Fernando. But if the date is at any position of the string, I find it easier to extract the data using some programming language and manipulate it there (because languages usually have more regex resources than Mysql, which is well-limited).


Dates x Regex

Just an addendum on the use of regex to check dates. A regex [0-9]{2}/[0-9]{2}/[0-9]{4} accepts values that are not valid dates, such as 99/99/9999 and 00/00/0000, see here an example. That’s because [0-9] means "any digit of 0 to 9".

It is possible to improve a little using something like:

(0[1-9]|[12][0-9]|3[01])/(0[1-9]|1[012])/(19|20)[0-9]{2}

This ensures that the days can only be from 01 to 31, the months 01 to 12, and the years 1900 to 2099.

But still, this regex accepts dates like 31/04/2019 (and April only has 30 days) and 29/02/2019 (February 29 in a non-leap year), see. It was unclear how data is entered into your database, but it may be that a typo occurs, or simply an invalid date appears in the middle of the string.

It is even possible to have a regex that checks these conditions. In this question from Soen there are some examples, but they are so complicated that it’s not worth it. Look at one of them, just to get an idea:

^(?:(?:31(\/|-|\.)(?:0?[13578]|1[02]))\1|(?:(?:29|30)(\/|-|\.)(?:0?[1,3-9]|1[0-2])\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:29(\/|-|\.)0?2\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:(?:0?[1-9])|(?:1[0-2]))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$

Do you understand? Neither do I.

Remembering that \d is not supported by Mysql, so you should exchange them for [0-9]. Even so, in my opinion it is not worth using this regex. It is a problem to understand and to maintain in the future (good luck to those who have to change it and add new cases, for example).

You need to find a balance. Using the giant regex above ensures that no invalid dates will be returned, but it is a maintenance nightmare. Using the suggested expressions at the beginning of this answer, you get snippets that seem like a date (may be right in most cases, but will also bring some invalid dates, something you should evaluate if it is acceptable). Once these snippets are extracted, an additional validation could be done later, preferably using some date API (i.e., extract this data from the database and validate it using your favorite programming language).

Regex is a powerful tool, but is not a magic solution that solves all cases.

  • 1

    fuck too, thank you.

1


  • Maybe we don’t need the regex:

Assuming we have the following table and want to get the first date of the description field:

inserir a descrição da imagem aqui

Running the command below:

select left(descricao , 10) from tb_dados;

We will have this result:

inserir a descrição da imagem aqui

  • Regex:
select left(descricao,10) FROM tb_dados where descricao RLIKE('^[0-9]{2}/[0-9]{2}/[0-9]{4}');
  • 1

    Very simple and functional. Thank you.

  • @Rod Just remembering that this solution only works if the field starts with a date. If you have some text before and the first date appears in the middle of the text, then it is better to use REGEXP_SUBSTR, as I suggested. I preferred to leave a more general answer because it had not been clear that in all its cases, the date appears at the beginning :-)

  • Fiz assim: select id_tabela,titulo,data_gabarito,left(data_gabarito,10) as d_gabarito FROM tabela
where data_gabarito RLIKE('^[0-9]{2}/[0-9]{2}/[0-9]{4}') 
order by str_to_date(d_gabarito,'%d/%m/%Y') desc;

  • @hkotsubo however a problem has arisen, it is not possible to handle the alias d_jig to make a refinement in Where. Wanted to display only the records with two days prior to the date d_jig.

  • summarizing...the data_jig field was rewritten by regex, it was also transformed from string to date in the sort to display the old ones first. Now I would like to take these dates and only display those that are next to today (now). Example: today is 02/07/2019 would only display records with dates from 02/05/2019. Complex no? rs

  • @Rod I understand (I think) :-) Anyway, I suggest that next times put in the question the details that can influence the answer. Remember that other site users don’t know your system, and these details can help you get more accurate answers :-)

Show 1 more comment

Browser other questions tagged

You are not signed in. Login or sign up in order to post.