I believe you’ve seen this question (at least it is the same regex), and there are also listed some possible uses for this resource (which I explain in more detail below).
It is worth remembering that, as already said in the other answers, in the specific expression you put, with no additional context, These parentheses are really useless.
But depending on the context, it can have its uses. Remembering that a pair of parentheses creates a capture group, that will be part of the match returned. Even if there is nothing inside it, a group will be created containing "nothing" (which in most implementations - if not all - results in an empty string).
And when would that be useful? Well, imagine a case where I have a function that gets the match from a regex and uses only groups 1 and 3 (examples in Python, but could be in any other language that supports capture groups):
def faz_algo(match):
g1 = match.group(1)
g3 = match.group(3)
# ... faz algo com os grupos 1 e 3 (os demais são ignorados)
import re
match = re.match(r'([a-z]+)(\d+)([a-z]+)', 'abc123xyz')
if match:
faz_algo(match)
In the example above, I am creating three groups: one with letters, the other with digits, and the other with letters. In the function faz_algo
, caught only groups 1 and 3 (in case they will be respectively abc
and xyz
).
Now let’s say I have a case where I want to repurpose the function faz_algo
, but now I need to take abc
and 123
. In this case, I would change the regex to:
match = re.match(r'([a-z]+)()(\d+)', 'abc123xyz')
I mean, now group 1 is the letters (abc
), group 2 is empty, and group 3 is the numbers (123
). This way, I don’t need to change the function faz_algo
, because she keeps picking up groups 1 and 3. I just need to adjust the regex so that these groups contain what I need.
Of course I could create another function that checks the groups of the match and refactor faz_algo
to receive values as parameters, instead of always looking for fixed groups. But I may not be able to modify the function faz_algo
(for example, it is from an external dependency, etc.). Finally, this is a situation in which an empty group would be useful (although I admit it is a "questionable elegance solution").
Another "esoteric" use is that an empty group can be used as a "marker": an indication that the regex "passed" may be a certain point.
Let’s assume that I have several numbers being read from somewhere, and I want to check which ones contain 4 digits, but only the digits from 1 to 4, and all digits are different (ie, 1234
and 3241
are ok, but 1123
, 3333
and 1235
nay). I know you can solve it easier without regex, but with regex an option would be:
import re
r = re.compile(r'^(?:1()|2()|3()|4()){4}\1\2\3\4$')
for n in range(1000000):
if r.match(str(n)):
print(n)
She’s very confused and complex, but basically: (?:1()|2()|3()|4())
is a catch group (indicated by (?:
), that is, the first parenthesis does not create a group. Within it we have a alternation with four options: digits 1 to 4.
Note that after each digit there is an empty group. And they only serve to indicate that regex has passed through there. That is, if it finds a digit 1
, group 1 will be set (it will contain the empty string, but what matters is that it will be set in the match). The same goes for the other digits: if the four groups are set, it means that you have at least one digit of each in the string.
Next, I put the quantifier {4}
to indicate that I want 4 digits. Note that here I have not yet guaranteed the uniqueness (the string could be, for example, 1122
- and in this case only groups 1 and 2 would be set).
And how do I make sure the 4 groups are set? I use the back-References: in the case, \1
refers to group 1, \2
group 2, etc. Since the groups are empty, then the values of the back-References will be empty strings, but only if the respective groups are set.
That is, if any group is not set, the back-Ference you won’t be either, and the regex won’t find a match. For example, if the number is 1124
, group 3 will not be set, and therefore \3
not either. So the regex fails, because it tries to look for \3
.
If all groups are set, indicates that all digits from 1 to 4 are present in the string - and the quantifier {4}
ensures that the string has only 4 digits (and as the groups are empty strings, so \1\2\3\4
does not interfere with the size of the string being searched; remember that the groups are only there to check if the respective digit was found). In fact, since the groups are empty strings, it doesn’t matter the order I put the back-References (remember: they are only there to check if the group was set).
Yeah, it’s Overkill. Yes, there are easier ways to check this without regex. But anyway, this would be a use for empty capture groups.
From what little I know, (regex is huge) it doesn’t make any sense in the expression q vc posted.
– user3010128
A lot of regular expression doesn’t make sense to me (because I don’t understand what it’s for). But that has a purpose has, otherwise it would not be possible to do it.
– Wallace Maxters
I usually test the ones I do in "testers" online, it made no difference with or without. I will look more carefully to help you. study source: http://aurelio.net/regex/guide/
– user3010128
@Wallacemaxters forgot this one? or still wish for a better answer?
– Guilherme Lautert
@Guilhermelautert I think it might fit just one issue explaining in which case it would be useful. (If you tell me that using an empty group is not useful in any case, I mark immediately).
– Wallace Maxters