What is the purpose of an empty parenthesis "()" in a regular expression?

Asked

Viewed 1,308 times

9

Searching in Stackoverlow English about regular expressions, I came across a certain expression where it presents in the same an empty group (an expression in empty parentheses).

Thus:

(DE)()([0-9]{1,12})

What is the purpose of this "empty group" (()) in a regular expression?

  • From what little I know, (regex is huge) it doesn’t make any sense in the expression q vc posted.

  • A lot of regular expression doesn’t make sense to me (because I don’t understand what it’s for). But that has a purpose has, otherwise it would not be possible to do it.

  • 1

    I usually test the ones I do in "testers" online, it made no difference with or without. I will look more carefully to help you. study source: http://aurelio.net/regex/guide/

  • @Wallacemaxters forgot this one? or still wish for a better answer?

  • @Guilhermelautert I think it might fit just one issue explaining in which case it would be useful. (If you tell me that using an empty group is not useful in any case, I mark immediately).

4 answers

14

Depending on the case just confuse who is reading the regex. Parentheses indicate group capture, so in the expression (DE)()([0-9]{1,12}) DE will be captured in the first group, in the second we will capture nothing and in the third [0-9]{1,12}, and each group can be referenced by $numerodogrupo (actually depends on the regex engine, some do not use dollar sign), so we have three groups: $1, $2 and $3. Practical example, invert the text DE25324534 using the regex you passed:

var str = 'DE25324534';
var inverted = str.replace(/(DE)()([0-9]{1,12})/, '$3$1$2');
document.write(inverted);

What happens there is that the original string is replaced by $3 (the digits) followed by the $1 (DE) group, followed by the $2 group (there is nothing inside), so you can see that () released in regex is for absolutely nothing in this case, however it may make sense in some situations as shown in the response of Guilherme Lautert.

  • One should not judge something by the first abstraction that one sees. This "serves for absolutely nothing", is very relative.

  • @Guilhermelautert also agrees. In the question I saw in the SOEN, it seemed to me related to something like : "If I replace the regex, I will not need to replace the order of capture of groups. Simply place an empty group, so that the count of the groups are similar in the substitution"

  • @Wallacemaxters I put together an example like this, but it’s very ugly, trying to improve '-'

7

A priori, no.

Parentheses are usually used to identify groups. Groups, in turn, are used for operations such as extracting specific information or locating areas to replace the considered character string.

In this case, the regular expression engine will only create an empty group, which can even be located.

  • 2

    This response is in accordance with what the group represents, "an empty group, which cannot even be located. " , the implementation will depend on the programmer.

5

Complementing the Response of the Gypsy.

Group are used for information extraction or reuse. A group () empty references the nothingness that in compiler would be the same as a direct transition to the next stage.

inserir a descrição da imagem aqui

Example (bad but demonstrative)

var replace = '$1$3$5';

'2016/02/02'.replace(/(\d{4})(\/)(\d{2})(\/)(\d{2})/, replace); // 20160202
'2016-02-02'.replace(/(\d{4})(-)(\d{2})(-)(\d{2})/, replace);   // 20160202
'20160202'.replace(/(\d{4})()(\d{2})()(\d{2})/, replace);       // 20160202
  • 1

    upvote. Great example and it really makes sense both in the theoretical basis in Finite Automaton and in practical utility. I edited my answer to not generalize so much and to reference your.

  • Ah, so the idea is to have a sequence where the parameterization is predicted in a certain way. That is, the empty group is used to force the parameter with the desired value, independent of the sequence in regex (you could have just put this in the answer :p)

  • @Wallacemaxters is this example was not the best, but yes it would be this.

1

I believe you’ve seen this question (at least it is the same regex), and there are also listed some possible uses for this resource (which I explain in more detail below).

It is worth remembering that, as already said in the other answers, in the specific expression you put, with no additional context, These parentheses are really useless.


But depending on the context, it can have its uses. Remembering that a pair of parentheses creates a capture group, that will be part of the match returned. Even if there is nothing inside it, a group will be created containing "nothing" (which in most implementations - if not all - results in an empty string).

And when would that be useful? Well, imagine a case where I have a function that gets the match from a regex and uses only groups 1 and 3 (examples in Python, but could be in any other language that supports capture groups):

def faz_algo(match):
    g1 = match.group(1)
    g3 = match.group(3)
    # ... faz algo com os grupos 1 e 3 (os demais são ignorados)

import re

match = re.match(r'([a-z]+)(\d+)([a-z]+)', 'abc123xyz')
if match:
    faz_algo(match)

In the example above, I am creating three groups: one with letters, the other with digits, and the other with letters. In the function faz_algo, caught only groups 1 and 3 (in case they will be respectively abc and xyz).

Now let’s say I have a case where I want to repurpose the function faz_algo, but now I need to take abc and 123. In this case, I would change the regex to:

match = re.match(r'([a-z]+)()(\d+)', 'abc123xyz')

I mean, now group 1 is the letters (abc), group 2 is empty, and group 3 is the numbers (123). This way, I don’t need to change the function faz_algo, because she keeps picking up groups 1 and 3. I just need to adjust the regex so that these groups contain what I need.

Of course I could create another function that checks the groups of the match and refactor faz_algo to receive values as parameters, instead of always looking for fixed groups. But I may not be able to modify the function faz_algo (for example, it is from an external dependency, etc.). Finally, this is a situation in which an empty group would be useful (although I admit it is a "questionable elegance solution").


Another "esoteric" use is that an empty group can be used as a "marker": an indication that the regex "passed" may be a certain point.

Let’s assume that I have several numbers being read from somewhere, and I want to check which ones contain 4 digits, but only the digits from 1 to 4, and all digits are different (ie, 1234 and 3241 are ok, but 1123, 3333 and 1235 nay). I know you can solve it easier without regex, but with regex an option would be:

import re

r = re.compile(r'^(?:1()|2()|3()|4()){4}\1\2\3\4$')
for n in range(1000000):
    if r.match(str(n)):
        print(n)

She’s very confused and complex, but basically: (?:1()|2()|3()|4()) is a catch group (indicated by (?:), that is, the first parenthesis does not create a group. Within it we have a alternation with four options: digits 1 to 4.

Note that after each digit there is an empty group. And they only serve to indicate that regex has passed through there. That is, if it finds a digit 1, group 1 will be set (it will contain the empty string, but what matters is that it will be set in the match). The same goes for the other digits: if the four groups are set, it means that you have at least one digit of each in the string.

Next, I put the quantifier {4} to indicate that I want 4 digits. Note that here I have not yet guaranteed the uniqueness (the string could be, for example, 1122 - and in this case only groups 1 and 2 would be set).

And how do I make sure the 4 groups are set? I use the back-References: in the case, \1 refers to group 1, \2 group 2, etc. Since the groups are empty, then the values of the back-References will be empty strings, but only if the respective groups are set.

That is, if any group is not set, the back-Ference you won’t be either, and the regex won’t find a match. For example, if the number is 1124, group 3 will not be set, and therefore \3 not either. So the regex fails, because it tries to look for \3.

If all groups are set, indicates that all digits from 1 to 4 are present in the string - and the quantifier {4} ensures that the string has only 4 digits (and as the groups are empty strings, so \1\2\3\4 does not interfere with the size of the string being searched; remember that the groups are only there to check if the respective digit was found). In fact, since the groups are empty strings, it doesn’t matter the order I put the back-References (remember: they are only there to check if the group was set).

Yeah, it’s Overkill. Yes, there are easier ways to check this without regex. But anyway, this would be a use for empty capture groups.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.