The process to "escape" a special character in a String
in Java has two steps:
- "Escape" special characters to Java.
- "Escape" special characters to the regular expression, which may include, "escape" the "escape" character".
Example: escaping parentheses
The parenthesis is not a special character for Java, but is for the regular expression, so it should receive an escape character \
before (Reason #1).
As the character \
is special in Java, it must have an escape and become \\
(Reason #2).
Upshot:
String regex1 = "\\(";
String regex2 = "\\)";
Example: escaping quotes
Double quotes are special for Java, so they need an escape with \
(Reason #1), but not special for regular expressions.
So the result is:
String regex3 = "\"";
Single quotes are not special most of the time (I can’t remember at the moment whether single quotes may have special meaning in some regular expression implementation), so they don’t need to escape at least for the most common uses.
String regex4 = "'";
Putting it all together
To capture text in brackets, you need the following elements:
A class of characters to capture everything that can be between parentheses. In this case:
[A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ"'!?$%:;,º°ª ]
A quantifier: +
- Delimiters for the group of characters to be captured (delimiters come before the limit characters if you do not want to include the original text parentheses in the captured group):
(
and )
- The limiting characters for the group, or parentheses in this case:
\(
and \)
By converting each one to Java strings, we can build the final expression:
Escape class in double quotes:
"[A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"'!?$%:;,º°ª ]"
Quantifier:
"[A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"'!?$%:;,º°ª ]+"
Delimiters:
"([A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"'!?$%:;,º°ª ]+)"
Limit characters with exhaust:
"\\(([A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"'!?$%:;,º°ª ]+)\\)"
Sample code:
String s = "Em primeiro lugar, quem é 1º colocado, em segundo quem é segundo (que é Rubinho)";
Matcher matcher = Pattern.compile("\\(([A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"'!?$%:;,º°ª ]+')\\)").matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Upshot:
who is Rubinho
Alternative
Instead of trying to specify all characters that may be within parentheses, how about only deleting those that cannot?
For example, the class [^()]
denies the parentheses and captures everything but them.
Applying all the steps of the previous topic, changing only the class of item #1, we can get to the following example, which has the same result:
String s = "Em primeiro lugar, quem é 1º colocado, em segundo quem é segundo (que é Rubinho)";
Matcher matcher = Pattern.compile("\\(([^()]+)\\)").matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
is
\\(
same, and he has to translate to\(
yes, which is for the Regex. escape correctly. Even, these quotes need two bars as well. One to escape to Java, and consequently one to escape to Regex.– Bacco
@Bacco when I put " or ' as suggested indicates error as well as (. Only with " compiles. Then what is " is actually followed by quotation marks and gives error. ( would be followed by parentheses.
– JNMarcos
Indicates build error, or does the source editor complain? You need to see if it’s a syntax error or an editor error. Anyway, you need to [Dit] and put the code snippet in the question, otherwise it gets kind of complicated to analyze the whole context to see if the problem is even in the only bar.
– Bacco
@Bacco you were right, but strangely you were not accepting the construction with (, ran, but indicated error in regex, already with only one ( the editor indicated error. But as confirmed by utluiz is just an escape when it is quotes. Thank you.
– JNMarcos
Like he said, it’s an escape if it’s to escape to Java. If the bar is part of Regex, there are always two, if the bar is not part of the regex (which seems to be your case, from what you said), it is one. It doesn’t matter if they’re quotes or if they’re any other characters. The important thing is to understand which of the two layers is making the escape (and adapt to your context). Don’t get used to "when it’s X use Y," or you might get confused. The important thing is to understand which layer (Regex or Java) is being escaped, not which character.
– Bacco