How to work with Regex on name validation?

Asked

Viewed 4,994 times

7

I have tried in many ways to develop a Regular Expression that validates:

  1. maximum size of 60 characters
  2. can’t be a number
  3. cannot have accents, cedilla and punctuation
  4. the first letter of the name capital and the remaining minuscule.

Examples of real names:

  • Jose da Silva
  • Nycolas Merino
  • Antonio Ferreira Pacheco

Examples of false names:

  • Jose da silva
  • Nycolas merino
  • antonio Ferreira Pacheco

What I’ve managed to create, is this: [A-Z][a-z]+[[ ][A-Z][a-z]+]* However, it is only validating the first and second names, if the person has 3 names does not validate the first character "maisculo" and also does not validate 60 characters. Yes, I need to do this in regular expression! If you want to test the expression, you can do it on this site: http://ferramentas.lymas.com.br/regexp/regexp_br.php#

  • What language?

  • Gmsantos .. C# / VB / . NET! Follows an expression that already works for another purpose. ([0-9a-za-Z]+([_. -]? [0-9a-za-Z]+)@[0-9a-za-Z]+[0-9,a-z,A-Z,.,-](.){1}[a-za-Z]{2,4})+$ This expression is placed inside an XML for validation of the answer in a question.

  • Can’t you perform any operation on the resulting string? A single line could either limit the number of characters or remove multiple spaces.

2 answers

7


Your general idea is ok (marry the first name, and zero or more times marry a space followed by another name), the problem is in the use of brackets ([]) in the second part of the expression - brackets match one and only one character, among the possible options. Switching to parentheses should solve the problem:

[A-Z][a-z]+([ ][A-Z][a-z]+)*

Note that depending on how this expression is used, it can only marry part of a string (ex.: 123Fulano Beltrano456 would have your "middle" married). If you want to ensure that the expression only matches the entire string, a middle is using the start delimiters (^) and end ($):

^[A-Z][a-z]+([ ][A-Z][a-z]+)*$

Finally, if you have a problem with capture groups, mark the expression inside the parentheses as "do not capture":

^[A-Z][a-z]+(?:[ ][A-Z][a-z]+)*$

About validating by a specific size, that my answer in a related question ("2 regular expressions in 1") shows a way to do this using lookarounds (i.e. test the string for the first regex, without consuming it, then test it again for the second regex):

(?=^.{2,60}$)^[A-Z][a-z]+(?:[ ][A-Z][a-z]+)*$

Example in the Ruble. P.S. If you are using this regex inside an XML, then maybe the lookaheads are not available. I don’t think that’s the case, but make sure engine used supports this functionality. Otherwise, there is little that I can suggest for you to validate the size, the ideal would be to do this in a separate step (like suggested by Guill in the comments).

Note that some of their "valid" names are invalid by this regex - those that have "da" in the middle (lowercase started). If you want to make an exception for "da" (and maybe also for "do", "de" and "e") you can do something like:

(?=^.{2,60}$)^[A-Z][a-z]+(?:[ ](?:das?|dos?|de|e|[A-Z][a-z]+))*$

Updated example.

  • mgibsonbr, thank you for your help! Concerteza met me, I made some modifications adding some accentuated characters that are now released!

  • mgibsonbr, because we have it at the beginning: (?= .{2,60}$) You mean what? And because we have 2 line start?

  • @Nycolasmerino Not to repeat the explanation here, see the linked question. Briefly, I’m putting two regular expressions together into one, since you can’t "mix" the two (i.e. the size and content are validated separately). And that’s also why we have two starts (and two endings) of line - because each regex acts independently. By the way, .{2,60} means "any character, 2 to 60 times".

1

For names in Portuguese, taking advantage of what was said by @mgibsonbr, and adding some more things I found on the net, I managed to reach a Regex almost perfect for names in Portuguese:

/(?=^.{2,60}$)^[A-ZÀÁÂĖÈÉÊÌÍÒÓÔÕÙÚÛÇ][a-zàáâãèéêìíóôõùúç]+(?:[ ](?:das?|dos?|de|e|[A-Z][a-z]+))*$/

Browser other questions tagged

You are not signed in. Login or sign up in order to post.