regex python delete all after the second occurrence of a whitespace

Asked

Viewed 39 times

3

That should be very simple and I can’t find an answer. I have several strings of different sizes that have more or less the same pattern:

'Art. 1° E'
'Art. 15. As'

What I want to do is delete everything after the second occurrence of whitespace, getting the results 'Art. 1°' and 'Art. 15.'

teste = "Art. 1° E"

print(re.sub(r'((.*?){2})', '', teste))
#Art.

Could someone help me with the regex?

  • See if it fits: print(re.findall(r'Art.\s\S+', teste)) .In case I tested using the constitution and apparently returns what you need. If it works let me know I put as answer after lunch.

1 answer

4


Just do:

print(re.sub(r' [^ ]+$', '', teste))

regex has a space (note that there is a space after the '), followed by one or more characters that are not space ([^ ]+), and the end of the string ($). This ensures that I take it from the last space until the end of the string.

But this only works in case the second space is the last. If the idea is to ignore the second space onwards, maybe it is better to make a regex that takes the initial stretch and ignore the rest:

import re
teste = "Art. 1° E abc xyz"
match = re.match(r'^Art\. \d+[.°]', teste)
if match:
    print(match.group(0)) # Art. 1°

That is, I search for "Art. " at the beginning of the string (indicated by ^), followed by one or more digits (\d+), followed by a dot or a character ° (indicated by [.°]). If it is found, I take only this stretch.


Another alternative is not to use regex:

print(teste[0:teste.rfind(' ')])

rfind finds the position of the last occurrence of space. So I use the syntax of slicing to take everything from the beginning of the string (the zero position) to the position of the last space.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.