Just one detail: in several of the other answers a character class containing the space (such as /^[a-záàâãéèêíïóôõöúçñ ]+$/i
, for example - notice that there is a space inside the brackets). It’s not wrong, but the problem is that this regex also considers valid strings that only have spaces:
console.log(/^[a-záàâãéèêíïóôõöúçñ ]+$/i.test(' ')); // true
In this case, perhaps it would be better to do the split
of the string, separating it by spaces, and then checking if each of the parts is a valid name (ie if it has only letters):
let nomes = [ 'Leandro Moreira', 'leandro moreira', 'Kézia Maria', 'kezia maria',
'Cabaço da silva', 'Cabaço da Silva', 'Fulano A123', ' '];
// removi o espaço da regex (agora ela só considera as letras)
let regex = /^[a-záàâãéèêíïóôõöúçñ]+$/i;
nomes.forEach(nome => {
let valido = nome.split(/ +/).every(parte => regex.test(parte));
console.log(`${nome} = ${valido ? 'válido': 'inválido'}`);
});
The split
is done considering one or more spaces as separator (/ +/
- notice that there is a space before the +
). It would also be possible to use /\s+/
, but the shortcut \s
also considers the TAB and line breaks (in addition to other characters, see the full list on documentation - recalling also that this list varies from one language to another). If it makes a difference, it will depend on the strings you are checking.
The split
returns an array containing the parts of the name. Then, I use the method every
, which checks if all parts correspond to regex (which in turn checks if it has only letters). If any of them do not match, the return is false
.
Another detail is that the quantifier +
means "one or more occurrences", which means that strings like 'a b c'
would also be considered valid (since, after the split
, each part of the "name" would have a letter). If you want each part of the name to have a minimum (and/or maximum) number of letters, you can change the +
for {}
. Examples:
[a-záàâãéèêíïóôõöúçñ]{2,}
: at least 2 characters (no maximum limit)
[a-záàâãéèêíïóôõöúçñ]{2,20}
: not less than 2, not more than 20 characters
[a-záàâãéèêíïóôõöúçñ]{10}
: exactly 10 characters
Use whatever makes the most sense in your case.
Another option is to use the method normalize
(that already has a good support of browsers), together with another regex, to remove the accents, and then just check by letters of a
to z
:
let nomes = [ 'Leandro Moreira', 'leandro moreira', 'Kézia Maria', 'kezia maria',
'Cabaço da silva', 'Cabaço da Silva', 'Tomas Müller', 'Fulano A123', ' '];
// não precisa mais das letras acentuadas
let regex = /^[a-z]+$/i;
nomes.forEach(nome => {
let valido = nome
// remove os acentos
.normalize("NFD").replace(/[\u0300-\u036f]/g, "")
// aqui é igual ao código anterior
.split(/ +/).every(parte => regex.test(parte));
console.log(`${nome} = ${valido ? 'válido': 'inválido'}`);
});
In a well summarized form, the normalization for the NFD form "breaks" a character accentuated in two. For example, the ã
is broken (or decomposed) into two characters: a
tiny (without any accent) and tilde (~
). (for more details on normalization, read here, here and here).
Then I remove the characters from the range \u0300-\u036f
, which corresponds to the Unicode block "Combining Diacritical Marks", which is where the accentuation characters (such as the til, among others). Thus, only the letters without accent remain, and I can check them with the regex [a-z]
(together with the flag i
, which makes the regex case insensitive, so consider both uppercase and lowercase letters).
Note the example above in the case of Müller
, that the first regex did not catch (but that is easily solved by adding the ü
on the list: [a-záàâãéèêíïóôõöúçñü ]
). Anyway, it’s up to you to choose whether you want to keep a fixed list, or use something more generic (it all depends on the nationalities of the names you’ll be dealing with and the types of characters that might appear).
Note: there are still other ranges that contain "accent" characters (actually, "Diacritical Marks"), such as "Combinining Diacritical Marks Suplement" and the "Combining Diacritical Marks Extended", among others. These blocks have diacritical Marks which are not used in English, so depending on the names you want to validate, you may not need to include them in replace
. But if you want to consider these characters as well, I would replace(/[\u0300-\u036f\u1dc0-\u1dff\u1ab0-\u1abe]/g, "")
.
Another alternative (which does not yet work on all browsers) is to use Unicode Property escapes:
let nomes = [ 'Leandro Moreira', 'leandro moreira', 'Kézia Maria', 'kezia maria',
'Cabaço da silva', 'Cabaço da Silva', 'Tomas Müller', 'Fulano A123', ' '];
// não precisa mais das letras acentuadas
let regex = /^[a-z]+$/i;
nomes.forEach(nome => {
let valido = nome
// remove os acentos
.normalize("NFD").replace(/\p{M}/ug, "")
// aqui é igual ao código anterior
.split(/ +/).every(parte => regex.test(parte));
console.log(`${nome} = ${valido ? 'válido': 'inválido'}`);
});
Now I use \p{M}
, that takes all the Combining characters. Remembering that in this case the regex has to have the flag u
so that the Unicode properties function. It is also worth noting that currently Firefox and IE do not support this feature.
Another alternative to defining all the characters in the nail (for those who need universal characters) is the Xregexp Unicode. With it you can use unicode categories as
\p{L}
. But really name validation sounds like something that can give problems.– Anthony Accioly
@bfavaretto tested the regsx here but it still accepts numbers. what I can do not allow this?
– Leandro Curioso
@Leandrocurious It was occurring a match partial name. I added
^
at the beginning and$
at the end of the expression to compare with the full name. See http://jsfiddle.net/C2Xcp/– bfavaretto
Perfect! I decided with this regex [a-záàââéèêêíïóõõõúçñALLIGIOENCYOONIOYOY ]+$
– Leandro Curioso