Error replacing occurrence in string using replaceAll

Asked

Viewed 704 times

6

I extract the text from several lines of a PDF, at the beginning of each line I have a configuration of the font size and family used on that line, but then I need to remove this information.

First I did using replace, as follows:

String myText = line.replace(fontConfiguration, "");

And this example of strings:

String line = "[ABCDEE+Georgia,BoldItalic-9.0]Relação de poemas";
String fontConfiguration = "[ABCDEE+Georgia,BoldItalic-9.0]";

I can replace perfectly, but there are still occurrences of fontConfiguration in the text, so I put replaceAll.

My question is: Why do I get this Exception when using the replaceAll?

This is an example that will show error:

String line = "[ABCDEE+Calibri-11.04]1 ";
String fontConfiguration = "[ABCDEE+Calibri-11.04]";
String myText = line.replaceAll(fontConfiguration, "");

Exception:

Method threw 'java.util.regex.Patternsyntaxexception' Exception. java.util.regex.Patternsyntaxexception: Illegal Character range near near index 16 [ABCDEE+Calibri-11.04] ^

  • I thought replaceAll took a string like replace, I never stopped to think that in cases like: String newStr = str.replaceAll("This", "That"); is also a regex. Thank you all.

  • Why are you disqualifying this question? Our friend needs help with regular expressions.!

  • @Danielamorais the problem and that the replaceAll() other than replace() uses regular expressions, so your string should be "\\[ABCDEE\\+Georgia,BoldItalic-9.0\\]"

  • @Davidschrammel both use regular expressions, only that the replace uses literal Pattern. If the question is open (it was closed who knows why) I try to include an answer considering this different behavior

2 answers

1


So much replace(Charsequence target, Charsequence Replacement) how much replaceAll(String regex, String Replacement) make substitution using matching patterns using regular expressions. The question that remains is: both of which replace(Charsequence target, Charsequence Replacement) how much replaceAll(String regex, String Replacement) use regular expressions, why only replaceAll(String regex, String Replacement) error for the same input? Note how such methods do this:

public String replace(CharSequence target, CharSequence replacement) {
    return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(this)
            .replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
public String replaceAll(String regex, String replacement) {
    return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}

The difference, as can be seen by their code, is the way in which the Pattern is creating. While replace(Charsequence target, Charsequence Replacement) uses Pattern.LITERAL, that is, the input is roughly treated as normal characters and not a regular expression. For example, if replace(Charsequence target, Charsequence Replacement) were it so:

public String replace(CharSequence target, CharSequence replacement) {
    return Pattern.compile(target.toString()).matcher(this)
            .replaceAll(Matcher.quoteReplacement(replacement.toString()));
}

We would also have problems with the entrance [ABCDEE+Calibri-11.04] as regex, because it is not a valid regular expression and now we are not using a literal string, but a normal regular expression pattern.

Remember that it is not the way such methods treat the input and use regular expressions that is wrong, but rather the goal of each of them.

The suggestion then is to use a valid expression in replaceAll(String regex, String Replacement), as \[.+\], that will ensure the replacement of everything that has more than one character and is initiated by [ and ended with ], then something like that:

final String[] lines = new String[] {"[ABCDEE+Calibri-11.04]1 ", "[ABCDEE+Georgia,BoldItalic-9.0]Relação de poemas"};
Arrays.stream(lines).forEach(line -> System.out.println(line.replaceAll("\\[.+\\]", "")));

Would print this:

1 
Relação de poemas

0

Observing the function signature String.replaceAll():

public String replaceAll(String regex, String replacement)

The mistake Illegal Character range is caused because the function interprets the first parameter as a regular expression (documentation), and brackets define a character class, within the class the hyphen defines a character range and the range i-1 is invalid.

To solve it is necessary to escape the brackets with a backslash (\). Other characters with special functions, such as . and +, also need to be escaped:

String line = "[ABCDEE+Calibri-11.04]1 ";
String fontConfiguration = "\[ABCDEE\+Calibri-11\.04\]";
String myText = line.replaceAll(fontConfiguration, "");

Browser other questions tagged

You are not signed in. Login or sign up in order to post.