The problem is that findall
traverse the string from left to right, and each time you find a match, the next search starts after the last found stretch.
In this case, in the string 0420202198
, after finding the 2020
, the search starts from the third 2
(what begins the stretch 2198
), and therefore the 2021
is not found.
To do what you need, an option is to use the method match
, passing the position where the search will begin:
import re
text = '0420202198'
r = re.compile('20[1-2][0-9]')
years = []
# fazer buscas começando em todas as posições da string
for pos in range(0, len(text)):
m = r.match(text, pos)
if m:
years.append(m.group())
print(years)
In this case you do not need to include the parentheses in regex (they serve to create capture groups, who are returned by findall
), for the method group
, when called without parameters, already returns all the chunk that was found.
Besides, I didn’t have to raw string (the r
before quotation marks). This syntax is useful when regex has characters such as \
(So I don’t have to write it as \\
), but in this case it is not necessary.
And I also put the results on a list, so that the return is the same as findall
(that returns a list of the results).
The exit is:
['2020', 'in 2021']
How your regex necessarily needs 4 digits to find a match, I could optimize a little bit the loop and use range(0, len(text) - 3)
in the for
(so I avoid iterating in the last 3 positions, because I know that from then on there are not enough characters to satisfy the regex).
You can also use the syntax of comprehensilist on, well over pythonic (and usually more succinct, but in this particular case, I do not know if it is so much):
import re
text = '0420202198'
r = re.compile('20[1-2][0-9]')
years = [m.group() for pos in range(0, len(text) - 3) for m in [r.match(text, pos)] if m]
print(years)
Or else:
years = [m.group() for m in (r.match(text, pos) for pos in range(0, len(text) - 3)) if m]
Both options above removed from here.
Okay, @Sam. I tested it here and it worked. Thank you very much.
– Antonio Braz Finizola
@Antoniobrazfinizola
findall
returns a list of all pouch, so I understood that you wanted them all, not just the last. And the above regex only gets the last occurrence (if you are three years old, for exampletext = '041202021982012'
- has 2020, 2021 and 2012, you just want the last or a list with the 3? Using.*
you only get the latter by using afor
as I suggested, you have a list with everyone). I’m sorry if I misunderstood what you needed, I just wanted to get the doubt even...– hkotsubo
Hi, @hkotsubo. Really my intention was to take the years further to the right, then take the last one from this list. I didn’t know that an algorithm could be made to get the last guy, like the one above.
– Antonio Braz Finizola
@Antoniobrazfinizola It is that the way it was asked, I understood that you needed to catch all. But all right, good that solved :-)
– hkotsubo