How to remove accents and other graphic signals from a Java String?

Asked

Viewed 39,510 times

55

How to remove accents and other graphic signals from a Java String? Ex.:

String s = "maçã";
String semAcento = ???; // resultado: "maca"

2 answers

75


I use to regex along with the class Normalizer. Thus:

public static String removerAcentos(String str) {
    return Normalizer.normalize(str, Normalizer.Form.NFD).replaceAll("[^\\p{ASCII}]", "");
}

7

If you are Java7+ you can use this solution found in Soen https://stackoverflow.com/a/1215117/1518921

First import this:

import java.text.Normalizer;
import java.util.regex.Pattern;

Then add this to your main class or other class you use:

public static String deAccent(String str) {
    String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD); 
    Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
    return pattern.matcher(nfdNormalizedString).replaceAll("");
}

And at the time of using would look something like:

System.out.print(deAccent("Olá, mundo!"));

He makes use of the regular expression () to trade them: \p{InCombiningDiacriticalMarks}+

See working on IDEONE: https://ideone.com/MtgLAC

  • This is the best answer because it only removes accents instead of all non-ASCII characters

  • @Mariano thanks!! I am formulating some details to explain the behavior of both in ASCII and Unicode. There are very significant differences, soon I will detail ;)

  • You can use emoticons, Greek or Japanese letters as an example ;-)

  • @Mariano then the behavior will be different, probably the way this will be ignored, it is lack of time to create a decent example, also I do not remember the "internal" behavior since who does the dirty work is actually the Normalizer.normalize, the replaceAll would be to remove the invalid. Has an answer on the InCombiningDiacriticalMarks on the site, but I found her very weak even.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.