How to validate a field To accept only Letters/Numbers using Regex

Asked

Viewed 1,326 times

2

I am creating validations for every Property of my classes. I am using a component called AbstractValidator. Example:

protected void ValidarChamadaTipo()
{
    RuleFor(p => p.ChamadaTipo)
        .NotNull().WithMessage("Certifique-se de ter informado o Tipo de Chamada");
}

I need to create two functions to validate "Accept Letters Only" and "Accept Numbers Only" using regex in the Lambda expression.

protected void ValidarChamadaTipoSomenteLetras()
{

}

protected void ValidarChamadaTipoSomenteNumeros()
{

}

Is it possible to do this? Someone knows how to do it?

  • Why not valid with date Annotation?

  • It’s just that this validation is on the server side. I was able to validate on the client side, but I don’t know how to use regex in regular expressions.... :(

  • Only numbers is ^\d+$, and only letters is ^\w+$. Only I do not put a answer pq do not know the details of validations and properties of C#, but if it is only to use regex, it goes like this: https://ideone.com/KyOqEX

  • In fact only letters would be ^[a-zA-Z]+$

  • Thank you @hkotsubo!!! That’s right!!! Able to mount something with what I found on the net: protected Static bool Somentenumeros(string character) { var rg = new Regex(" [0-9]*$"); Return rg.Match(character). Success; } protected Static bool Somenteletters(string character) { var rg = new Regex(" [a-za-Z]+$"); Return rg.Match(character). Success; }

  • Answer the question I will mark as an answer!!! :)

  • @Masterjr Response Added

Show 2 more comments

1 answer

1


The regex classes are in namespace System.Text.RegularExpressions. For your cases, I recommend using the markers ^ and $, meaning, respectively, the beginning and the end of the string. Thus you ensure that the entire string will only have what is specified in the expression.


Numbers only

For numbers, use the shortcut \d, which corresponds to any digit from 0 to 9. And since the string can have multiple numbers, use the quantifier +, which means "one or more occurrences". Combining both, we have \d+, meaning "one or more digits".

The full expression to check if the string has only digits is ^\d+$ (one or more digits, from start to end of string). If you just want to know if the string matches regex (only True or False), can use the method IsMatch:

Console.WriteLine(Regex.IsMatch("0184784983324", @"^\d+$")); // True
Console.WriteLine(Regex.IsMatch("a184784983324", @"^\d+$")); // False

You said in the comments that you used method Match. It is useful if you want to get more information, such as the snippet of the string that was captured, the position in which it occurs, etc. As in this case you just want to know if the string corresponds to regex, use IsMatch is more direct.


Letters only

For letters, one option is to use the character class [a-zA-Z]. Brackets represent a set of characters. In this case, [a-zA-Z] is "any letter of a to z or of A to Z" (any of them). So the expression is ^[a-zA-Z]+$ (one or more letters, from the beginning to the end of the string):

Console.WriteLine(Regex.IsMatch("abcdeFGHI", @"^[a-zA-Z]+$")); // True
Console.WriteLine(Regex.IsMatch("abcde9GHI", @"^[a-zA-Z]+$")); // False

There is only one detail: this regex does not accept accented characters. An option to solve this is to simply put all the desired characters inside the brackets, something like this:

Console.WriteLine(Regex.IsMatch("AçãojáJÁ", @"(?i)^[a-záéíóúõãçàâêô]+$")); // True

Note that I used (?i) at the beginning, which enables the mode case insensitive (that is, regex does not differentiate between upper and lower case letters). This makes the rest of the expression a little smaller, so I only need to put the characters once (without the (?i), I would have to put both uppercase and lowercase, so it would look something like [a-záéíóúõãçàâêôA-ZÁÉÍÓÚÕÃÇÂÊÔ]).

Another alternative is to create regex using RegexOptions correspondent:

// IgnoreCase para não diferenciar maiúsculas e minúsculas
Regex r = new Regex(@"^[a-záéíóúõãçàâêô]+$", RegexOptions.IgnoreCase);
Console.WriteLine(r.IsMatch("AçãojáJÁ")); // True

Alternatively, use Unicode normalization to decompose accented characters.

In Unicode, each character has a unique numeric code (called code point, read this article to understand the details). But some characters can be represented in different ways, defined by the forms of standardisation. Without going into too much detail, basically means that some characters can be represented by code points different (see more about standardisation here, here and here).

An example is the character Á (the letter A uppercase with acute accent), which can be represented in two ways:

  1. like code point U+00C1 (LATIN CAPITAL LETTER A WITH ACUTE) - in Unicode the value of the code point is represented in the form "U+xxxx", where "xxxx" is the value in hexadecimal
  2. as two code points:
    • U+0041 (LATIN CAPITAL LETTER A)
    • U+0301 (COMBINING ACUTE ACCENT)

Then an option is to decompose the string to the form NFD, transforming accented characters to the format 2 described above (a letter followed by one or more Combining characters). For this I use the method Normalize passing as parameter a NormalizationForm (available on namespace System.Text).

Then I use the regex ^([a-zA-Z]\p{M}*)+$:

  • [a-zA-Z]: a letter of a to z (upper and lower case)
  • \p{M}*: zero or more (*) characters that are in one of the 3 Unicode categories:

    • Mn (Mark, Nonspacing)
    • Mc (Mark, Spacing Combining)
    • Me (Mark, Enclosing)

    Basically, accents (such as COMBINING ACUTE ACCENT above) and cedilla (all resulting from when the string is normalized to NFD) fall into one of these categories.

Therefore, this section of the regex considers any letter from A to Z, whether or not followed by one or more accentuation characters. Normalize to NFD ensures the string will be in this format.

Finally, I put all this in parentheses, and add the + to indicate that this can be repeated several times (I can have several letters - accented or not - in regex):

Console.WriteLine(Regex.IsMatch("AçãojáJÁ".Normalize(NormalizationForm.FormD), @"^([a-zA-Z]\p{M}*)+$")); // True
Console.WriteLine(Regex.IsMatch("Ação8jáJÁ".Normalize(NormalizationForm.FormD), @"^([a-zA-Z]\p{M}*)+$")); // False

Usually "the internet" recommends using \p{L} instead of [a-zA-Z]. Only that \p{L} will accept any letter of any other language (Japanese, Korean, Arabic, Cyrillic characters, etc.), so choose the one that best suits your use cases.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.