How do I remove accents in a string?

Asked

Viewed 49,747 times

71

I have a string

áéíóú

That I want to convert to

aeiou

How do I remove accents? Need to save to database as a URL.

8 answers

75


You can use this function:

public static string RemoveAccents(this string text){   
    StringBuilder sbReturn = new StringBuilder();   
    var arrayText = text.Normalize(NormalizationForm.FormD).ToCharArray();
    foreach (char letter in arrayText){   
        if (CharUnicodeInfo.GetUnicodeCategory(letter) != UnicodeCategory.NonSpacingMark)
            sbReturn.Append(letter);   
    }   
    return sbReturn.ToString();   
} 

Source: http://www.ninjacode.com.br/post/2011/08/10/Retirar-acentos-de-strings-C.aspx

  • 2

    This also does not erase symbols of currency units, digits and various punctuation symbols?

34

You can also read all the characters that are in the variable comAcentos, and given a Replace in the parameter that was passed in the function, that is, the letters that are comAcentos by semAcentos and returns the new text.

public static string removerAcentos(string texto)
{
    string comAcentos = "ÄÅÁÂÀÃäáâàãÉÊËÈéêëèÍÎÏÌíîïìÖÓÔÒÕöóôòõÜÚÛüúûùÇç";
    string semAcentos = "AAAAAAaaaaaEEEEeeeeIIIIiiiiOOOOOoooooUUUuuuuCc";

    for (int i = 0; i < comAcentos.Length; i++)
    {
        texto = texto.Replace(comAcentos[i].ToString(), semAcentos[i].ToString());
    }
    return texto;
}
  • 2

    Some words in Spanish and other languages and also in the web language, use ñ and Ñ. I could add this letter too to make the list more complete.

27

Using LINQ is very practical:

public static string RemoverAcentuacao(this string text)
{
    return new string(text
        .Normalize(NormalizationForm.FormD)
        .Where(ch => char.GetUnicodeCategory(ch) != UnicodeCategory.NonSpacingMark)
        .ToArray());
}

What are NormalizationForm.FormD and UnicodeCategory.NonSpacingMark

This is a way of representing the original string so that marks such as accentuation, cedilla, among others, are separated into distinct characters: the base character, which is the letter, and the character of the markup. The accent character in this case is called NonSpacingMark, i.e., marker without space, means that it is a marker that does not occupy any space, and will be applied to the previous character.

Using LINQ we can remove these markings, leaving only the base characters, without the markings and build a new string from these characters.

26

public static string RemoverAcentos(this string texto)
{
   if (string.IsNullOrEmpty(texto))
       return String.Empty;

   byte[] bytes = System.Text.Encoding.GetEncoding("iso-8859-8").GetBytes(texto);
   return System.Text.Encoding.UTF8.GetString(bytes);
}

string nome = "João Felipe Portela";
string nomeSemAcentos = nome.RemoverAcentos();
  • 1

    Just one say to enhance the code reading. The Else condition is not required, since we are returning an Empty string if the text is empty or null.

  • 2

    Better late than never!

19

There is this method I use to remove accentuation:

public static string RemoverAcentos(string texto){

    string s = texto.Normalize(NormalizationForm.FormD);
    StringBuilder sb = new StringBuilder();
    for (int k = 0; k < s.Length; k++)
    {
        UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(s[k]);
        if (uc != UnicodeCategory.NonSpacingMark)
        {
            sb.Append(s[k]);
        }
    }
    return sb.ToString();
}

17

13

The page code (codepage) Greek (ISO) can do this

Information about this codepage can be obtained at the return of the method System.Text.Encoding.GetEncodings(). See more here.

Greek (ISO) has codepage 28597 and name iso-8859-7.

Let’s go to code... o/

string text = "Você está numa situação lamentável";

string textEncode = System.Web.HttpUtility.UrlEncode(text, Encoding.GetEncoding("iso-8859-7"));
//result: "Voce+esta+numa+situacao+lamentavel"

string textDecode = System.Web.HttpUtility.UrlDecode(textEncode);
//result: "Voce esta numa situacao lamentavel"

So write this function...

public string RemoverAcentuacao(string text)
{
    return
        System.Web.HttpUtility.UrlDecode(
            System.Web.HttpUtility.UrlEncode(
                text, Encoding.GetEncoding("iso-8859-7")));
}

Note that Encoding.GetEncoding("iso-8859-7") is equivalent to Encoding.GetEncoding(28597). The first search by name, the second by Encoding’s codepage.

Other options can be seen on Stackoverflow in English:

1

In Qtcreator

QString GerCorrida::removeAccentuation(QString text)
{
    QString with = "ÄÅÁÂÀÃäáâàãÉÊËÈéêëèÍÎÏÌíîïìÖÓÔÒÕöóôòõÜÚÛüúûùÇç";
    QString withOut = "AAAAAAaaaaaEEEEeeeeIIIIiiiiOOOOOoooooUUUuuuuCc";
    for (int i = 0; i < with.size(); i++)
    {
        text = text.replace( with[i], withOut[i] );
    }
    return text;
}

Browser other questions tagged

You are not signed in. Login or sign up in order to post.