How to split a String that contains white spaces at the beginning?

Asked

Viewed 1,420 times

10

For the problem in question, I need to remove all the special characters and spaces and count the new possible outputs. My intention is to separate the string with the method split(). For this, based on another expression I saw, I created this:

String[] d = s.split("[,.!?'@_] *| +");

It works. The problem is that if you have many whitespaces before the beginning of the expression it does not exclude. I tried to put space in the expression before, but it didn’t work. Can someone help me? Follow the full code:

        String s = "           YES      leading spaces        are valid,    problemsetters are         evillllll";
        String[] d = s.split("[,.!?'@_] *| +");
        int i, c = d.length;
        System.out.println(c);
        for(i = 0; i < d.length; i++){
            System.out.println(d[i]);
        }

The output produced is this:

9

YES
leading
spaces
are
valid
problemsetters
are
evillllll

But this has to be it:

8
YES
leading
spaces
are
valid
problemsetters
are
evillllll

2 answers

13


This is a known problem of split when the String begins with spaces, including already discussed in the Soen.

The simplest solution is to use the method trim(), that removes the spaces from the beginning and end of the String, and then do the split:

String[] d = s.trim().split("[,.!?'@_] *| +");

With this you will have the desired output (the array with 8 elements).

  • 1

    Thank you very much! ^_^

7

Let’s look at this simpler case:

class TesteRegex {
    public static void main(String[] args) {
        String s = " A B C ";
        String[] d = s.split(" ", 5);
        System.out.println(d.length);
        for (int i = 0; i < d.length; i++) {
            System.out.println("[" + d[i] + "]");
        } 
    }
}

It produces as output:

5
[]    
[A]
[B]
[C]
[]

See it running on the ideone.

The problem is that space is seen as a separator. So, the first space separates the beginning of String of A, the second space separates the A of B, the third space separates the B of C, the fourth space separates the C end of string. That way, we would have 5 resulting particles: the beginning, A, B, C and the end.

However, if you remove the , 5 of the above code, only the first four will come. The reason can be seen in javadoc of the method split who says that:

Trailing Empty strings are therefore not included in the Resulting array.

Translating:

Empty strings at the end so are not included in the resulting array.

However, there is no rule for empty strings at the beginning (Leading Empty strings), there is only rule for strings at the end.

Looking at the code of the method split(String, int), he worries about removing the empty strings at the end when the limit (which is the second parameter of split) is zero:

        // Add remaining segment
        if (!limited || list.size() < limit)
            list.add(substring(off, length()));

In the analogue method of class java.util.regex.Pattern also:

    // Add remaining segment
    if (!matchLimited || matchList.size() < limit)
        matchList.add(input.subSequence(index, input.length()).toString());

But he does not worry at all about doing this with the empty strings at the beginning.

I’m not sure what the reason for this behavior is. I think it has to have something to do with that, but I’m not sure. However, whatever the motivation, this is purposeful behavior and it’s not accidental. Moreover, this behaviour could not be changed due to compatibility issues.

So the solution is you use a trim() or else check if the first element is blank and ignore it or remove it if that is the case.

  • Thank you very much! ^_^

Browser other questions tagged

You are not signed in. Login or sign up in order to post.