The brackets have special meaning in regex: they create a character class. For example, [abc]
means "the letter a
or the letter b
or the letter c
" (any of them).
So the expression [Ticket: (\d+)]
is a character class meaning "the letter T
or the letter i
or the letter c
, etc..." - the detail is that all this expression corresponds to only one character (anyone who is among the options within the brackets).
Moreover, many meta-characters (those who have some special meaning in regex) "lose their powers" when they are inside the brackets. So in this regex, the parentheses and the plus sign literally mean the characters (
, )
and +
, which means this regex will also find a match if the string is something like +()
, for example - see here an example.
Anyway, for regex to take literally the characters [
and ]
, Just slip them away with \
. Then the regex must be \[Ticket: (\d+)\]
.
Options to find the pouch
It was unclear whether the string has multiple occurrences of "[Ticket: (números)]"
and you want to find all, or if this stretch only occurs once. Anyway, let’s see some options.
If there are several occurrences of this text and you want to capture them all, you can use findall
, which returns a list of all occurrences:
import re
texto = 'lorem ipsum [Ticket: 20021501280806] blablabla [Ticket: 123456789] etc [Ticket: 987654] xyz.'
r = re.compile(r'\[Ticket: (\d+)\]')
matches = r.findall(texto)
print(matches) # ['20021501280806', '123456789', '987654']
See the code running on Ideone.com
In this case, the section containing the numbers (\d+
) is in parentheses, which forms a catch group. And when regex has capture groups, findall
returns only them. So the list already returns only the numbers.
If you want, you don’t have to compile
and can use regex directly:
matches = re.findall(r'\[Ticket: (\d+)\]', texto)
According to the documentation, the use of compile
is more efficient if the same regex is used several times in the same program. It is up to you to choose which one to use.
Another option is to use finditer
, that returns a iterator containing the pouch:
import re
texto = 'lorem ipsum [Ticket: 20021501280806] blablabla [Ticket: 123456789] etc [Ticket: 987654] xyz.'
r = re.compile(r'\[Ticket: (\d+)\]')
for match in r.finditer(texto):
print(match.group(1))
See the code running on Ideone.com
With each iteration of for
, is returned a Match
containing information about the section that was found. As the information that interests me is the one in the capture group, I use the method group
to get it. And as the stretch (\d+)
is the first pair of parentheses of regex, so it is the first capture group (group 1), so I do match.group(1)
to get the snippet that was captured. With each iteration of for
, the match contains one of the occurrences found. The output is:
20021501280806
123456789
987654
The difference between the two approaches above is that findall
returns a list of all occurrences found, while finditer
returns a iterator, who carries only one match at a time each iteration. In case there are many occurrences to be found, finditer
will spend a lot less memory (by not loading all pouch at once), and does not search for all occurrences if the loop is interrupted, for example (already findall
always need to load all occurrences to return the list).
And just like findall
, you can also use finditer
without having to call compile
before:
for match in re.finditer(r'\[Ticket: (\d+)\]', texto):
print(match.group(1))
If the text only occurs once - or if it occurs several times, but you only want the first occurrence - you can use search
:
import re
texto = 'lorem ipsum [Ticket: 20021501280806] blablabla [Ticket: 123456789] etc [Ticket: 987654] xyz.'
r = re.compile(r'\[Ticket: (\d+)\]')
match = r.search(texto)
if match:
print(match.group(1)) # 20021501280806
See the code running on Ideone.com
In this case it finds the first occurrence of regex in the text, ignoring the others. And as well as the above options, there is also the option to use search
directly, without calling compile
before:
match = re.search(r'\[Ticket: (\d+)\]', texto)
if match:
print(match.group(1))