Alternative String.replace()

Asked

Viewed 266 times

3

I got a flea behind my ear.

I have a method that removes / changes some characters from the string, it is something like this :

public static String replaceCharSet(String texto) {
    texto = texto.replace("&", "E");
    texto = texto.replace("$", "S");
    texto = texto.replace("á", "a");
    ................
    return texto;
}

Well this repeats itself for several and several lines and besides causing a loss in performance I am suspicious of memory Leak.

Is there any more elegant / functional way to do this ?

Follow the list of all characters I need to edit/modify :

"&", "E"
"$", "S"
"ç", "c"
"Ç", "C"
"á", "a"
"Á", "A"
"à", "a"
"À", "A"
"ã", "a"
"Ã", "A"
"â", "a"
"Â", "A"
"ä", "a"
"Ä", "A"
"é", "e"
"É", "E"
"è", "e"
"È", "E"
"ê", "e"
"Ê", "E"
"ë", "e"
"Ë", "E"
"í", "i"
"Í", "I"
"ì", "i"
"Ì", "I"
"î", "i"
"Î", "I"
"ï", "i"
"Ï", "I"
"ó", "o"
"Ó", "O"
"ò", "o"
"Ò", "O"
"õ", "o"
"Õ", "O"
"ô", "o"
"Ô", "O"
"ö", "o"
"Ö", "O"
"ú", "u"
"Ú", "U"
"ù", "u"
"Ù", "U"
"û", "u"
"Û", "U"
"ü", "u"
"Ü", "U"
"º", "o"
"ª", "a"
"-", " "
".", " "

I use JAVA 8, unable to migrate at the moment to other versions. It is an old code here of the company that I want to improve.

  • 1

    Ever tried to make a regex? What other items do you need to replace besides the 3 mentioned?

  • I haven’t tried/thought about it yet. I need to remove/change several other "special" characters : ö > o , ª > a , never do something like á > empty , because I need the complete string only with them modified.

1 answer

4


Basically you need to exchange accented characters for not accented class Normalize looks like a good option she makes character decomposition based on UTF-8 code and this behavior varies according to the chosen form.

As there are four exceptions I made a replace for each since $ will not be converted to S, nor & for E. You can organize them as an Enum in your class.

import java.text.Normalizer;

public class t {

    String entrada = "olá mundo? é ª º 123 ? $ & * ., x";

    entrada = entrada.replace('$', 'S')
                     .replace('&', 'E')
                     .replace('-', ' ')
                     .replace('.', ' ');

    String saida = Normalizer.normalize(entrada, Normalizer.Form.NFKD);
    System.out.println(saida.replaceAll("\\p{InCombiningDiacriticalMarks}+", ""));
}

Exit:

ola mundo? e a o 123 ? S E *  , x

Based on:

Easy way to remove UTF-8 Accents from a string?

Unicode Normalization Forms

Unicode Normalization

  • I found a problem, some characters remain to be part of UTF-8 example : $ - & and need to remove them

  • @Mateusveloso , . also need to be removed?

  • Just the ". " by " at least is what we have in code here, then I need something that brings me the same return because it is used in every system

  • @Mateusveloso I changed the answer.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.