Eliminate consecutive duplicate string letters

Asked

Viewed 200 times

3

How do I scroll through an array of strings and delete consecutive repeat letters?

Incoming: String[] x = {"lleonardo", "joaoo"}

Exit: String[] x = {"leonardo", "joao"}

I created the function below, but the more I tried to find solutions, but I got complicated, and now I’m "stuck" and I can’t reach the expected result.

    static String[] Palavras(String array[]){
        
        String[] resultado = new String[array.length];
        String y = "";
        
        for(int i=0; i < array.length; i++) {
            String x = array[i];
            
            for(int j=0; j<=x.length(); j++) {
                
                if(x.charAt(j) != x.charAt(j+1)) {
                    System.out.println(x.charAt(j));
                    y += x.charAt(j);
                }
                
            }
        
        }
        
        return resultado;
    }
}

I’m at the beginning, so if anyone can give me a north, I’d appreciate it.

2 answers

3


The indexes of a string start at zero, so they go from zero to length - 1. So it’s wrong to do j<=x.length(), so you’re getting an extra index at the end and error when trying to access a non-existent position. It’s right to use < instead of <=.

As for the algorithm, just save the previous character, and you only add the current character in the new string if it is different from the previous one. Like this:

String[] array = {"lleonardo", "joaoo"};
String[] result = new String[array.length];
for (int i = 0; i < array.length; i++) {
    String atual = array[i];
    StringBuilder sb = new StringBuilder();
    char anterior = 0;
    for (int j = 0; j < atual.length(); j++) {
        char c = atual.charAt(j);
        if (c != anterior) {
            sb.append(c);
        }
        anterior = c;
    }
    result[i] = sb.toString();
}

To create the new string I used a StringBuilder, that for several successive concatenations in a loop, is more efficient than concatenating strings directly.

At the end, the array result will have strings without repeated consecutive characters.


But this solution has limits. Of course, if you only have strings containing Portuguese texts, you probably won’t have any problems. But if you have something like that:

// sim, um emoji direto no código
String[] array = { "" };

It no longer works. The "short" explanation is that Java internally stores strings in UTF-16 (according to own documentation quotes: "To String represents a string in the UTF-16 format"), and some characters end up occupying 2 char's (print out "".length(), and see that the result is 4 - each emoji needs 2 char's to be stored, and length returns the size of the array of char used internally). The long explanation for understanding all these details is here.

Anyway, if you want to delete the repeated characters for this case, then we have to iterate through the code points of the string:

String[] array = { "aaxybb" };
String[] result = new String[array.length];
for (int i = 0; i < array.length; i++) {
    String atual = array[i];
    StringBuilder sb = new StringBuilder();
    int anterior = -1, cp;
    for (int j = 0; j < atual.length(); j += Character.charCount(cp)) {
        cp = atual.codePointAt(j);
        if (cp != anterior) {
            sb.appendCodePoint(cp);
        }
        anterior = cp;
    }
    result[i] = sb.toString();
}

It will still fail if the string has grapheme clusters or accents normalized in NFD, but if you want to delve into these cases, I suggest reading here.

1

A slightly different approach would be to remove the repeated characters instead of concatenating the distinct characters. As the class String does not have a method to remove a character, the way to achieve this is by using the substring method or by converting the String to a Stringbuilder, by removing the repeated character and making the conversion back to String:

static String[] Palavras(String array[]){
    
    String[] resultado = new String[array.length];
    
    for(int i = 0; i < array.length; i++) {
        String palavra = array[i];
        for(int j = 0; j < palavra.length() - 1; j++) {
            while ((j+1) < palavra.length() 
                    && palavra.charAt(j) == palavra.charAt(j+1)) {
                palavra = new StringBuilder(palavra).deleteCharAt(j).toString();
            }
        }
        resultado[i] = palavra;
    }
    return resultado;
}

If you have no impediment to using regular expressions, I would suggest this more streamlined solution:

static String[] Palavras2(String array[]){
    Pattern padrao = Pattern.compile("([A-Za-z])\\1+");
    String[] result = new String[array.length];
    for (int i = 0; i < array.length; i++) {
        String atual = padrao.matcher(array[i]).replaceAll("$1");
        result[i] = atual;
    }
    return result;
}
  • 1

    Another option is to use Pattern.compile("(.)\\1+"), so already takes one or more occurrences of the repeated character (instead of only one), and in the substitution make replaceAll("$1") - this also avoids the use of Lookahead, which makes regex a little more efficient (not that regex is the most efficient thing in the world, but finally, compare the amount of Steps here and here)

  • Good suggestion @hkotsubo ... I improved the regex.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.