String.split() function with separator containing brackets and asterisk

Asked

Viewed 168 times

3

I have the following code:

String teste = "meta[[*]]etapa[[*]]especif[[*]]unid[[*]]qtd[[*]]01/01/2000[[*]]02/02/2000[[**]]";
String[] split = teste.split("[[*]]");
for (String string : split) {
    System.out.println(string);
}

I cannot understand why the return is coming in the following way:

meta[[
]]etapa[[
]]especif[[
]]unid[[
]]qtd[[
]]01/01/2000[[
]]02/02/2000[[

]]
  • You wanted to play the [[*]] literally? Break down based on it and have it totally removed?

  • I wanted to build a String list based on this separator field [[]]*

2 answers

2


The method split receives a regular expression as a parameter ().

And some characters have special meaning in regex. The brackets create a character class, and the asterisk is a quantifier (means "zero or more occurrences").

In order for these characters to "lose their powers" and be interpreted as ordinary characters with no special meaning, you must escape them with \. But how the regex is passed inside a String, it should be written as \\. That is, the [ should be written as \\[, and the same goes for the * and the ]. Then it would look like this:

String[] split = teste.split("\\[\\[\\*\\]\\]");

Doing so, the exit is:

meta
etapa
especif
unid
qtd
01/01/2000
02/02/2000[[**]]

It was unclear whether the [[**]] is also a separator. If so, just switch to:

String[] split = teste.split("\\[\\[\\*{1,2}\\]\\]");

The quantifier {1,2} means "at least 1, at most 2". And since it is right after the asterisk, this means that the regex accepts both one and two asterisks. Therefore the split consider 1 or 2 asterisks to break the String. The way out becomes:

meta
etapa
especif
unid
qtd
01/01/2000
02/02/2000

For the record, your regex was only taking the asterisk. This is because within a class of characters (within the brackets), the asterisk does not need to be escaped with \.

And according to the documentation, it is possible to have one character class inside another. For example, [a[b]] is the same as [ab]. So, [[*]] ends up being the same as [*], which is a character class that only has the *. That is, this regex only takes the asterisk.

To know what a regex picks up, you can do this test:

import java.util.regex.*;

...
Matcher m = Pattern.compile("[[*]]").matcher(teste);
while (m.find())
    System.out.println(m.group());

In your case, you will see that the code prints multiple asterisks. Hence the split did the break only on asterisks and did not eliminate the brackets.

1

This is because what is expected within the split() is a regex, so you need to use an escape, and your code will look like this:

       String[] split = teste.split("\\[\\[\\*{1,2}\\]\\]");

Browser other questions tagged

You are not signed in. Login or sign up in order to post.