How to create regular expression to search for numbers in parentheses?

Asked

Viewed 122 times

4

I’m trying to create a Regex to get numbers that are in parentheses. So:

1) Pergunta 1  
2) Pergunta 2  
3) Pergunta 3  
4) Pergunta 4  
.  
.  
10) Pergunta 10

So far I’ve managed to reach that: (^[0-9].* )

But you didn’t get the 10).

Necessarily I need to get the number, the parentheses and the space after.

2 answers

6

One option is to use:

^(\d+\) )

The parentheses have special meaning in regex, so that it captures the character itself ), you should write it as \). And notice there’s a gap between the \) and the last ), so that the regex picks up the space after the parentheses.

I also used the quantifier + instead of *, for the * means "zero or more occurrences" (i.e., if it has no digit, it also serves), while the + means "one or more occurrences" - so I guarantee it must have at least one digit. (if I use *, regex can pick up lines that have no digits at the beginning, such as ) etc... - see).

I put the section I want in parentheses (the numbers, \) and space), for thus they form a capture group that contains all this stretch, so it can be easily retrieved later. See in regex101.com that the groups are highlighted in green (and see that, thanks to the + instead of *, she no longer picks up the lines that have no digits).

And the bookmark ^ ensures that this snippet will only be checked at the beginning of the string.


You can also change \d for [0-9]. Depending on the language/engine it won’t make a difference, but in some \d can accept any characters from Unicode category "Number, Decimal Digit", which includes characters such as ٠١٢٣٤٥٦٧٨٩, among others (see this answer for more details). If you know that your texts do not have such characters, either use one or the other. But if you want to limit it to just the digits from 0 to 9, use [0-9] instead of \d.

To another answer suggested using \s instead of space (which also works). But this shortcut also corresponds to other characters, such as line breaks (\n, \r), TAB, among others (also varies according to language/engine). If you want regex to take only the white space, use what I suggested above.

  • Very good your explanation. Thank you. It will help a lot of people.

5


You need to escape the parentheses with a bar for the regex not to consider it part of the expression, and the expression "[0-9]" can be exchanged for " d":

^(\d+\)\s)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.