How to validate with regex a string containing only letters, whitespace and accented letters?

Asked

Viewed 87,300 times

34

I’m looking to validate one input text for full name in order to accept only letters, blank spaces and accented letters.

Valid formats:

Leandro Moreira  
leandro moreira  
Kézia Maria  
kezia maria  
Cabaço da silva  
Cabaço da Silva

This regex cannot accept special characters outside accented letters.

7 answers

55


If the target is just the common English accents, it’s easy to list them one by one. I would do a regex with the following blocks:

  • A-Za-z upper and lower case without accent
  • áàâãéèêíïóôõöúçñ: accentuated vowels of Portuguese, cedilla and others of lambuja, lowercase
  • ÁÀÂÃÉÈÊÍÏÓÔÕÖÚÇÑ: accentuated vowels of Portuguese, cedilla and other lambuja vowels, uppercase
  • spaces

Therefore:

/^[A-Za-záàâãéèêíïóôõöúçñÁÀÂÃÉÈÍÏÓÔÕÖÚÇÑ ]+$/

Or, leaving the lowercase/upper case distinction for implementation:

/^[a-záàâãéèêíïóôõöúçñ ]+$/i

Demo (of this second option)

  • 6

    Another alternative to defining all the characters in the nail (for those who need universal characters) is the Xregexp Unicode. With it you can use unicode categories as \p{L}. But really name validation sounds like something that can give problems.

  • 1

    @bfavaretto tested the regsx here but it still accepts numbers. what I can do not allow this?

  • 2

    @Leandrocurious It was occurring a match partial name. I added ^ at the beginning and $ at the end of the expression to compare with the full name. See http://jsfiddle.net/C2Xcp/

  • 2

    Perfect! I decided with this regex [a-záàââéèêêíïóõõõúçñALLIGIOENCYOONIOYOY ]+$

24

I liked the bfavaretto solution better, but I’ll leave a shorter alternative here:

var reg = /[a-zA-Z\u00C0-\u00FF ]+/i

Explanation: this is a match of the characters 'a' to 'z', 'A' to 'Z', and finally, of all the Unicode characters from 'A' to 'ṃ', so I think it should include all accented characters.

The complete list of Latin characters in Unicode can be seen here: http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm

The complete list contains about 128 characters, of which we use only one part. The expression I used includes some strange and unlikely, such as the Æ and at least two mathematical signs. You might want to use a more accurate expression like bfavaretto, or use narrower Unicode tracks.

  • 4

    Good solution! The only problem is that this Unicode range includes × and ÷.

9

Completing the reply of the bfavareto, when putting regular expression code for names, remember that we can have situations like Antonio D'Ávila. In these cases, also insert the apostrophe. And the space is interesting to leave to the machine, with the s:

/^[A-Za-záàâãéèêíïóôõöúçñÁÀÂÃÉÈÍÏÓÔÕÖÚÇÑ'\s]+$/

  • 1

    \s home [ \n\r\t\f]

  • Very good. This is the most complete solution! There really is this apostrophe in the Brazilian Portuguese language.

7

If the last name is required and there is only one field to be validated, you can use the following regex:

/\b[A-Za-zÀ-ú][A-Za-zÀ-ú]+,?\s[A-Za-zÀ-ú][A-Za-zÀ-ú]{2,19}\b/gi

It validates only letters, accents and whitespace. It does not accept numbers and each string (name) must have at least 3 characters.

Demo: jsfiddle

5

Just one detail: in several of the other answers a character class containing the space (such as /^[a-záàâãéèêíïóôõöúçñ ]+$/i, for example - notice that there is a space inside the brackets). It’s not wrong, but the problem is that this regex also considers valid strings that only have spaces:

console.log(/^[a-záàâãéèêíïóôõöúçñ ]+$/i.test('     ')); // true

In this case, perhaps it would be better to do the split of the string, separating it by spaces, and then checking if each of the parts is a valid name (ie if it has only letters):

let nomes = [ 'Leandro Moreira', 'leandro moreira', 'Kézia Maria', 'kezia maria',
              'Cabaço da silva', 'Cabaço da Silva', 'Fulano A123', '     '];
// removi o espaço da regex (agora ela só considera as letras)
let regex = /^[a-záàâãéèêíïóôõöúçñ]+$/i;
nomes.forEach(nome => {
    let valido = nome.split(/ +/).every(parte => regex.test(parte));
    console.log(`${nome} = ${valido ? 'válido': 'inválido'}`);
});

The split is done considering one or more spaces as separator (/ +/ - notice that there is a space before the +). It would also be possible to use /\s+/, but the shortcut \s also considers the TAB and line breaks (in addition to other characters, see the full list on documentation - recalling also that this list varies from one language to another). If it makes a difference, it will depend on the strings you are checking.

The split returns an array containing the parts of the name. Then, I use the method every, which checks if all parts correspond to regex (which in turn checks if it has only letters). If any of them do not match, the return is false.

Another detail is that the quantifier + means "one or more occurrences", which means that strings like 'a b c' would also be considered valid (since, after the split, each part of the "name" would have a letter). If you want each part of the name to have a minimum (and/or maximum) number of letters, you can change the + for {}. Examples:

  • [a-záàâãéèêíïóôõöúçñ]{2,}: at least 2 characters (no maximum limit)
  • [a-záàâãéèêíïóôõöúçñ]{2,20}: not less than 2, not more than 20 characters
  • [a-záàâãéèêíïóôõöúçñ]{10}: exactly 10 characters

Use whatever makes the most sense in your case.


Another option is to use the method normalize (that already has a good support of browsers), together with another regex, to remove the accents, and then just check by letters of a to z:

let nomes = [ 'Leandro Moreira', 'leandro moreira', 'Kézia Maria', 'kezia maria',
              'Cabaço da silva', 'Cabaço da Silva', 'Tomas Müller', 'Fulano A123', '     '];
// não precisa mais das letras acentuadas
let regex = /^[a-z]+$/i;
nomes.forEach(nome => {
    let valido = nome
        // remove os acentos
        .normalize("NFD").replace(/[\u0300-\u036f]/g, "")
        // aqui é igual ao código anterior
        .split(/ +/).every(parte => regex.test(parte));
    console.log(`${nome} = ${valido ? 'válido': 'inválido'}`);
});

In a well summarized form, the normalization for the NFD form "breaks" a character accentuated in two. For example, the ã is broken (or decomposed) into two characters: a tiny (without any accent) and tilde (~). (for more details on normalization, read here, here and here).

Then I remove the characters from the range \u0300-\u036f, which corresponds to the Unicode block "Combining Diacritical Marks", which is where the accentuation characters (such as the til, among others). Thus, only the letters without accent remain, and I can check them with the regex [a-z] (together with the flag i, which makes the regex case insensitive, so consider both uppercase and lowercase letters).

Note the example above in the case of Müller, that the first regex did not catch (but that is easily solved by adding the ü on the list: [a-záàâãéèêíïóôõöúçñü ]). Anyway, it’s up to you to choose whether you want to keep a fixed list, or use something more generic (it all depends on the nationalities of the names you’ll be dealing with and the types of characters that might appear).


Note: there are still other ranges that contain "accent" characters (actually, "Diacritical Marks"), such as "Combinining Diacritical Marks Suplement" and the "Combining Diacritical Marks Extended", among others. These blocks have diacritical Marks which are not used in English, so depending on the names you want to validate, you may not need to include them in replace. But if you want to consider these characters as well, I would replace(/[\u0300-\u036f\u1dc0-\u1dff\u1ab0-\u1abe]/g, "").

Another alternative (which does not yet work on all browsers) is to use Unicode Property escapes:

let nomes = [ 'Leandro Moreira', 'leandro moreira', 'Kézia Maria', 'kezia maria',
              'Cabaço da silva', 'Cabaço da Silva', 'Tomas Müller', 'Fulano A123', '     '];
// não precisa mais das letras acentuadas
let regex = /^[a-z]+$/i;
nomes.forEach(nome => {
    let valido = nome
        // remove os acentos
        .normalize("NFD").replace(/\p{M}/ug, "")
        // aqui é igual ao código anterior
        .split(/ +/).every(parte => regex.test(parte));
    console.log(`${nome} = ${valido ? 'válido': 'inválido'}`);
});

Now I use \p{M}, that takes all the Combining characters. Remembering that in this case the regex has to have the flag u so that the Unicode properties function. It is also worth noting that currently Firefox and IE do not support this feature.

4

function removeSpecialCharSimple(strToReplace) {
strSChar = "áàãâäéèêëíìîïóòõôöúùûüçÁÀÃÂÄÉÈÊËÍÌÎÏÓÒÕÖÔÚÙÛÜÇ";
strNoSChars = "aaaaaeeeeiiiiooooouuuucAAAAAEEEEIIIIOOOOOUUUUC";
var newStr = "";
for (var i = 0; i < strToReplace.length; i++) {
    if (strSChar.indexOf(strToReplace.charAt(i)) != -1) {
        newStr += strNoSChars.substr(strSChar.search(strToReplace.substr(i, 1)), 1);
    } else {
        newStr += strToReplace.substr(i, 1);
    }
}

return newStr.replace(/[^a-zA-Z 0-9]/g, '');
}
  • 2

    Welcome to SOPT. It’s good to see you’re willing to respond, but take the time to explain your answer.

  • 1

    @drumerwhite used its function and it was very useful, thank you, I added it in onkeyup="this.value = removeSpecialCharSimple(this.value);" , Thank you !

-2

Fala brother, has a simple regex to use in your validations:

const regex = /[^a-zA-Z\wÀ-ú ]/g;

Browser other questions tagged

You are not signed in. Login or sign up in order to post.