Regex - Special Character Removal C#

Asked

Viewed 51,028 times

5

Regex.Replace. is a great solution to remove accentuation.

Now I just can’t do the removal of a character type, I have a string that receives the text "1° General Place", string has the character '°', there is a list for these types of characters? How are you doing to eliminate it?

3 answers

19

I would use a simple regular expression that brings only letters and numbers:

Regex.Replace(minhaString, "[^0-9a-zA-Z]+", "");
  • 1

    Oops, just to point out the class \W (any character other than a word). Fountain: MSDN

  • 1

    The @Cigano response is also what I would do: just be sure to include a " " so that spaces are also accepted.

  • 1

    \W will marry . ^0-9a-zA-Z does not match accented characters. Only regex I see for that would be a denied list including accented characters.

  • It works very well, thank you all!

9


I don’t know a specific list for these characters.

One approach that you can use is the reverse: to subistituite all the characters that nay belong to a certain interval, with an interval denied [^ ]

Example

(?i) - Makes the regex case incensitive

[^0-9a-záéíóúàèìòùâêîôûãõç\s] - Box all characters that are not in the range A to Z (a-z), 0 to 9 (0-9), spaces and the like (\s) and accentuated (áéíóúàèìòùâêîôûãõç)

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = "Você chegou em 1º lugar, Parabéns!";
      string pattern = @"(?i)[^0-9a-záéíóúàèìòùâêîôûãõç\s]";
      string replacement = "";
      Regex rgx = new Regex(pattern);
      string result = rgx.Replace(input, replacement);

      Console.WriteLine("String Original: {0}", input);
      Console.WriteLine("String tratada : {0}", result);                             
   }
}

Editing

The cool thing about regular expressions is that we can solve the same problem in several ways. Doing some tests here I remembered the "or" | and so I was able to apply [^/w/s] (Denied list of alphanumeric accents and spaces) followed by [ºª], thus resulting in the expected result in a cleaner way: [^\w\s]|[ºª]

  • 1

    Your routine is more explained, liked and works well! Thank you!

  • 1

    @gmsantos the circumflex accent on the word you and the accent on Congratulations still persists, how to remove these accents ?

  • @Adrianosuv I believe this is another question

3

It follows a simple method I did where it keeps only numbers and letters without accents, and removes all possible special characters.

 public static string ObterStringSemAcentosECaracteresEspeciais(string str)
        {
            /** Troca os caracteres acentuados por não acentuados **/
            string[] acentos = new string[] { "ç", "Ç", "á", "é", "í", "ó", "ú", "ý", "Á", "É", "Í", "Ó", "Ú", "Ý", "à", "è", "ì", "ò", "ù", "À", "È", "Ì", "Ò", "Ù", "ã", "õ", "ñ", "ä", "ë", "ï", "ö", "ü", "ÿ", "Ä", "Ë", "Ï", "Ö", "Ü", "Ã", "Õ", "Ñ", "â", "ê", "î", "ô", "û", "Â", "Ê", "Î", "Ô", "Û" };
            string[] semAcento = new string[] { "c", "C", "a", "e", "i", "o", "u", "y", "A", "E", "I", "O", "U", "Y", "a", "e", "i", "o", "u", "A", "E", "I", "O", "U", "a", "o", "n", "a", "e", "i", "o", "u", "y", "A", "E", "I", "O", "U", "A", "O", "N", "a", "e", "i", "o", "u", "A", "E", "I", "O", "U" };

            for (int i = 0; i < acentos.Length; i++)
            {
                str = str.Replace(acentos[i], semAcento[i]);
            }
            /** Troca os caracteres especiais da string por "" **/
            string[] caracteresEspeciais = { "¹", "²", "³", "£", "¢", "¬", "º", "¨", "\"", "'", ".", ",", "-", ":", "(", ")", "ª", "|", "\\\\", "°", "_", "@", "#", "!", "$", "%", "&", "*", ";", "/", "<", ">", "?","[", "]", "{", "}", "=", "+", "§" ,"´", "`", "^", "~" };

            for (int i = 0; i < caracteresEspeciais.Length; i++)
            {
                str = str.Replace(caracteresEspeciais[i], "");
            }

            /** Troca os caracteres especiais da string por " " **/
            str = Regex.Replace(str, @"[^\w\.@-]", " ",
                                RegexOptions.None, TimeSpan.FromSeconds(1.5));

            return str.Trim();
        }

Browser other questions tagged

You are not signed in. Login or sign up in order to post.