First of all, you don’t need two sets of parentheses in a row.
When there are parentheses, they form a catch group. That is, what is inside them will be available in a group, if regex finds a match. This is the standard behavior of parentheses.
But when you don’t want to capture a group, and just group sub-expressions, you use the syntax (?:
, defining a catch group. Basically, this says that the parentheses pair should not create a group if regex finds a match.
So do ((?:...))
it’s kind of weird. It’s a no-capture group within a capture group, which means you said you want to capture something, and that something is "something I don’t want to capture". Contradictory and unnecessary, so the first thing is to eliminate this (and depending on the case, choose only one of them, if necessary).
Anyway, what’s inside the parentheses is exmo|excelentissimo
, which means "the string exmo
or the string excelentissimo
". This regex takes only this snippet, ignoring the rest of the string.
To get the whole line, just use a regex that picks up all the text:
import re
text = """
TEXTO ACIMA
EXMO. SR. DR. JUÍZ DE DIREITO DO JUIZADO ESPECIAL CÍVEL DA COMARCA DA CAPITAL
TEXTO ABAIXO
EXCELENTISSIMO. SR. DR. JUÍZ DE DIREITO DA VARA ESPECIAL CÍVEL DA COMARCA DA CAPITAL
MAIS TEXTO"""
results = re.findall(r'^(?:EXMO|EXCELENTISSIMO)\. SR\. DR\. JUÍZ DE DIREITO (?:DO JUIZADO|DA VARA) ESPECIAL CÍVEL DA COMARCA DA CAPITAL$',
text, re.IGNORECASE | re.MULTILINE)
for result in results:
print(result)
Notice I used (?:EXMO|EXCELENTISSIMO)
and (?:DO JUIZADO|DA VARA)
for the parts that can be one or the other. I used catch-no groups - note the syntax (?:
- for according to the documentation of findall
, when there are capture groups in regex, only the groups are returned. But since you want the entire string, I used catch groups.
The rest of the text, however, as it does not change, may be exactly what you want ("MR. DR. JUDGE etc"). The only exception is the point, since in regex it has special meaning (meaning "any character"), then it must be escaped and written as \.
.
Another detail are the markers ^
and $
, which usually means the beginning and end of the string. But thanks to flag MULTILINE
, they are also interpreted as the beginning and end of a line. With this you ensure that the line contains only what is specified in regex.
The code above prints:
EXMO. SR. DR. JUÍZ DE DIREITO DO JUIZADO ESPECIAL CÍVEL DA COMARCA DA CAPITAL
EXCELENTISSIMO. SR. DR. JUÍZ DE DIREITO DA VARA ESPECIAL CÍVEL DA COMARCA DA CAPITAL
How did you use the flag IGNORECASE
, regex will also take passages with lower case letters ("Exmo. sr. dr. etc...") and even mixed lower case letters ("Exmo. Sr. Dr ..."). If you want it to be uppercase, remove this flag (but don’t forget to keep the MULTILINE
):
results = re.findall(r'^(?:EXMO|EXCELENTISSIMO)\. SR\. DR\. JUÍZ DE DIREITO (?:DO JUIZADO|DA VARA) ESPECIAL CÍVEL DA COMARCA DA CAPITAL$',
text, re.MULTILINE)
When I was editing your question, I saw that the text was occupying 3 lines:
EXMO. SR. DR. JUÍZ DE DIREITO
DO JUIZADO ESPECIAL
CÍVEL DA COMARCA DA CAPITAL
I don’t know if it’s like that or if it should be all in one line. In any case, we can exchange the space after "RIGHT" and "SPECIAL" by \s
, which considers both space and line break:
results = re.findall(r'^(?:EXMO|EXCELENTISSIMO)\. SR\. DR\. JUÍZ DE DIREITO\s(?:DO JUIZADO|DA VARA) ESPECIAL\sCÍVEL DA COMARCA DA CAPITAL$',
text, re.MULTILINE)
If you want, you can use [ \n]
(a space or \n
- notice that there is a space after the [
), since \s
also considers other characters, such as TAB and others mentioned in documentation.
Got it, my friend. Grateful for the explanations. I have one more question: there are cases where appears SPECIAL CIVEL and only CIVEL. How do I for these cases?
– Antonio Braz Finizola
@Antoniobrazfinizola If "SPECIAL" is optional, do
(?: ESPECIAL)? CÍVEL
(attention to the space before the "SPECIAL"). The?
after parentheses makes all content in parentheses optional.– hkotsubo
(?: SPECIAL)? didn’t work though ? (?: SPECIAL)? with a first '?' ahead. Grateful!
– Antonio Braz Finizola
@Antoniobrazfinizola Maybe you have put an extra space (or some other detail), because with me it worked: https://ideone.com/UubJ61
– hkotsubo
Blz, I’ll check it out. I thank you again for the detail. You’ve helped a lot.
– Antonio Braz Finizola
another question appeared. If I want to capture only the word CIVEL, and after that word can appear other things, how can I do? Like "Mr. Mr. Dr. Law Judge of the Special Civil Court of the District of the Capital" I want to take up to "Mr. Exmo. DR. LAW JUDGE OF SPECIAL CIVIL COURT".
– Antonio Braz Finizola
@Antoniobrazfinizola Just take out the part that doesn’t matter: https://ideone.com/s1i7IJ
– hkotsubo
@Antoniobrazfinizola As soon as I turn off the computer, I suggest that in case of different doubts, you ask another question, because then you are not depending only on me (since a new question is more visible for anyone to answer, because it appears on the main page, different from the comments, which only appear here) :-) I say this also because the idea of the site is to have a specific problem per question, and this helps to keep the site organized (don’t take this as "laziness to help" on my part) :-)
– hkotsubo
Quiet, all right. Thanks again!
– Antonio Braz Finizola