Forcing backreference in regex

Asked

Viewed 198 times

3

These days ago I asked here about a regex that validates dates and how to force the separators to regex validated the following format: dd/mm/yyyy

So based on it was trying to force the separators using backreference in a regex that validates the formats: yyyy/mm/dd but I’m not getting anyone could explain to me how to find the values for the backreference and do this? A regex that needs to force the backreferences that validates YYYY/mm/dd is this.

R"(^(?:\d{4}([-/.])(?:(?:(?:(?:0?[13578]|1[02]|j(?:an|ul)|[Mm]a[ry]|[Aa]ug|[Oo]ct|[Dd]ec)([-/.])(?:0?[1-9]|[1-2][0-9]|3[01]))|(?:(?:0?[469]|11|[Aa]pr|[Jj]un|[Ss]ep|[Nn]ov)([-/.])(?:0?[1-9]|[1-2][0-9]|30))|(?:(0?2|[Ff]eb)([-/.])(?:0?[1-9]|1[0-9]|2[0-8]))))|(?:(?:\d{2}(?:0[48]|[2468][048]|[13579][26]))|(?:(?:[02468][048])|[13579][26])00)([-/.])(0?2|[Ff]eb)([-/.])29)$)"
  • 4

    Regardless of the correct answer, I don’t think validating dates at this level is a cool thing to implement as a regular expression. I wrote a lot about it here: https://answall.com/a/272541/500

  • 1

    friend, I’m sorry but I don’t understand. You can exemplify the entries and validations?

  • @Peace a regex validates the following data entry: yyyy/mm/dd or 2000/02/29 or 2000/Feb/29 or 2000/Feb/29 however how I am using different tabs precise force with backreference because the regex is validating the following entry: 2000/02-29 or 2000/Feb.29 or 2000-Feb.29 when it should validate only with some of the tabs...

  • 1

    @jsbueno based on what you said there I was trying to force with backreference but did not understand very well how to find the values or understand the groups to determine the values of backreferences... To connected that is not the best solution I am studying regex by issues of topic of a book that a teacher passed to me when I was doing in the facul the same is of C however as I am studying c++ so I sought to study the Std::regex.

2 answers

3

The idea of backreference in Regex is that you can reuse a block through the group created by it. For example:

In that regex ([a-c])x\1x\1 is defined as group 1 the following block ([a-c]), which defines the letter "a", "b" or "c".

Then the following words "axaxa", "bxbxb" or "cxcxc" are valid as explained in this link. The site https://regexr.com/ is a useful tool to identify groups and validate regex.

  • this I understood what I did not understand is how to find the values for the backreference and put in the regez for it to work because I have the following regex here https://answall.com/questions/272539/regex-validar-determinateddata-format/273720#273720 and the same is validating with backreference the separators but when modifying it for the above regex receiving year/month/day I am not able to understand how to find the values of backreferences..

2


My answer complements the response from Paul R. F. Amorim, showing how what he said is applied to the Regex Expression posed in the question.

Note that in a Regex you create the Groups in parentheses, what is within parentheses will be captured by the group; and, each Group has a numerical id so that we can reference it. The id 0 corresponds to the match of the whole regex; we can know the ids of the Groups within the regex expression by observing the order - from left to right - in which the parentheses are opened:

In Regex a(b|c)(apartamento|ca(sa|rro)):

  1. the Group of id 1 is (b|c) who will return b or c;
  2. the Group of id 2 is (apartamento|ca(sa|rro)) who will return apartamento or casa or carro;
  3. the Group of id 3 is (sa|rro) who will return sa or rro (or will be undefined if id group 2 contains apartamento).

If you want a Group not to have a reference, ie an id, you can use (?:) which creates a non-capturing group, as explained in that reply of Soen (in English).

Your Regex starts like this: (^(?:\d{4}([-/.])... See that the Group ([-/.]) has the id 2, because before him we have a non-capturing group (?:\d{4}... and a Group of id 1 which is opened by the first parenthesis(^....

Date Separator can be obtained by id 2 which is the Group reference "([-/.])", who will return - or / or .. To refer to this group it is sufficient to make \2 as @Paulo explained.

Currently your Regex presents several times Groups equal to "([-/.])", we can simply keep the first of these (which has id 2, as I explained above) and replace the others by his reference which is \2; i decided to keep the reference within Groups to facilitate, so that, we have a Before/After like this:

Before: (^(?:\d{4}([-/.])...([-/.])...([-/.])...([-/.])...
Afterward: (^(?:\d{4}([-/.])...(\2)...(\2)...(\2)...

With this, Regex will only match if all separators are equal to the one captured by the id Group 2, so that we only validate dates that don’t mix different separators. Then there will be match on a date like 2000/02/28or 2000-02-28, but there will be no match on a date like 2000/02-28.

In the end, your Regex looks like this:

(^(?:\d{4}([-/.])(?:(?:(?:(?:0?[13578]|1[02]|j(?:an|ul)|[Mm]a[ry]|[Aa]ug|[Oo]ct|[Dd]ec)(\2)(?:0?[1-9]|[1-2][0-9]|3[01]))|(?:(?:0?[469]|11|[Aa]pr|[Jj]un|[Ss]ep|[Nn]ov)(\2)(?:0?[1-9]|[1-2][0-9]|30))|(?:(0?2|[Ff]eb)(\2)(?:0?[1-9]|1[0-9]|2[0-8]))))|(?:(?:\d{2}(?:0[48]|[2468][048]|[13579][26]))|(?:(?:[02468][048])|[13579][26])00)([-/.])(0?2|[Ff]eb)(\7)29)$)

Note that at the end of the expression I used a reference to the id Group 7 instead of the id group 2: ...(\7)29)$); this was necessary due to the specific characteristics of its Regex, which gives a different treatment for the days 29 of the month February. This becomes clearer in the Debuggex:

inserir a descrição da imagem aqui

See in the image that it would not work to do the Group 9 (Ref 4) reference the Group 2 because the Group 2 will be undefined when the date is February 29th, but, we do the Group 9 reference the Group 7 that will not be indefinite on the dates of 29 of February.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.