How to indicate in a regex that the symbols '(' and ')', the parentheses, are one of the alternatives in a list of symbols in Java?

Asked

Viewed 3,803 times

2

I am developing a code that captures a text using regular expressions (regex). This text consists of parentheses.

The point is that parentheses are used in regular expressions as a group definator and I want to use them as literals.

I’ve tried to use \\( how to escape, but Eclipse already rejects, saying that only a few other symbols are escaped (the traditional Java characters).

I tried to \\\\(, comes to rotate, but soon gives error, and checking indicates that in fact it "translates" to \\( instead of ( as literal.

"First, who is 1st placed, second who is second (which is Rubinho)"

([A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"\'!?$%:;,º°ª]+)

I wanted to add the parentheses in this character list.

  • 1

    is \\( same, and he has to translate to \( yes, which is for the Regex. escape correctly. Even, these quotes need two bars as well. One to escape to Java, and consequently one to escape to Regex.

  • @Bacco when I put " or ' as suggested indicates error as well as (. Only with " compiles. Then what is " is actually followed by quotation marks and gives error. ( would be followed by parentheses.

  • Indicates build error, or does the source editor complain? You need to see if it’s a syntax error or an editor error. Anyway, you need to [Dit] and put the code snippet in the question, otherwise it gets kind of complicated to analyze the whole context to see if the problem is even in the only bar.

  • @Bacco you were right, but strangely you were not accepting the construction with (, ran, but indicated error in regex, already with only one ( the editor indicated error. But as confirmed by utluiz is just an escape when it is quotes. Thank you.

  • 1

    Like he said, it’s an escape if it’s to escape to Java. If the bar is part of Regex, there are always two, if the bar is not part of the regex (which seems to be your case, from what you said), it is one. It doesn’t matter if they’re quotes or if they’re any other characters. The important thing is to understand which of the two layers is making the escape (and adapt to your context). Don’t get used to "when it’s X use Y," or you might get confused. The important thing is to understand which layer (Regex or Java) is being escaped, not which character.

1 answer

3


The process to "escape" a special character in a String in Java has two steps:

  1. "Escape" special characters to Java.
  2. "Escape" special characters to the regular expression, which may include, "escape" the "escape" character".

Example: escaping parentheses

The parenthesis is not a special character for Java, but is for the regular expression, so it should receive an escape character \ before (Reason #1).

As the character \ is special in Java, it must have an escape and become \\ (Reason #2).

Upshot:

String regex1 = "\\(";
String regex2 = "\\)";

Example: escaping quotes

Double quotes are special for Java, so they need an escape with \ (Reason #1), but not special for regular expressions.

So the result is:

String regex3 = "\"";

Single quotes are not special most of the time (I can’t remember at the moment whether single quotes may have special meaning in some regular expression implementation), so they don’t need to escape at least for the most common uses.

String regex4 = "'";

Putting it all together

To capture text in brackets, you need the following elements:

  1. A class of characters to capture everything that can be between parentheses. In this case:

    [A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ"'!?$%:;,º°ª ]
    
  2. A quantifier: +

  3. Delimiters for the group of characters to be captured (delimiters come before the limit characters if you do not want to include the original text parentheses in the captured group): ( and )
  4. The limiting characters for the group, or parentheses in this case: \( and \)

By converting each one to Java strings, we can build the final expression:

  1. Escape class in double quotes:

    "[A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"'!?$%:;,º°ª ]"
    
  2. Quantifier:

    "[A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"'!?$%:;,º°ª ]+"
    
  3. Delimiters:

    "([A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"'!?$%:;,º°ª ]+)"
    
  4. Limit characters with exhaust:

    "\\(([A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"'!?$%:;,º°ª ]+)\\)"
    

Sample code:

String s = "Em primeiro lugar, quem é 1º colocado, em segundo quem é segundo (que é Rubinho)";
Matcher matcher = Pattern.compile("\\(([A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"'!?$%:;,º°ª ]+')\\)").matcher(s);
if (matcher.find()) {
    System.out.println(matcher.group(1));
}

Upshot:

who is Rubinho

Alternative

Instead of trying to specify all characters that may be within parentheses, how about only deleting those that cannot?

For example, the class [^()] denies the parentheses and captures everything but them.

Applying all the steps of the previous topic, changing only the class of item #1, we can get to the following example, which has the same result:

String s = "Em primeiro lugar, quem é 1º colocado, em segundo quem é segundo (que é Rubinho)";
Matcher matcher = Pattern.compile("\\(([^()]+)\\)").matcher(s);
if (matcher.find()) {
    System.out.println(matcher.group(1));
}
  • 1

    +1. I was sending an example in IDEONE just to solve immediately, but nothing better than a complete answer ;)

  • 1

    Thank you, it was a very complete answer. It helped me and very much, @utluiz

Browser other questions tagged

You are not signed in. Login or sign up in order to post.