According to the documentation, the shortcut \w
does not consider accented letters.
One simple way to solve is to include accented characters in regex:
var palavras = nome.match(/\b[\wáéíóúâêîôûãõç]+\b/gi);
I also put the flag i
to consider both upper and lower case, otherwise the regex would have to be áÁéÉ etc...
.
There is also the option to use /\b[\wà-ÿ]+\b/gi
, for the interval à-ÿ
already includes several accented characters (see here), but it will also accept some more characters, such as ÷
(DIVISION SIGN), among others that are not accented letters (see link already indicated to see all characters).
It is worth remembering that the shortcut \w
also considers digits from 0 to 9 and the character _
. If you want to consider only letters, just change the regex to:
/\b[a-záéíóúâêîôûãõç]+\b/gi
Alternative (yet not compatible with all browsers) is to use Unicode Property Escapes:
var palavras = nome.match(/\b\p{L}+\b/gu);
In the case, \p{L}
are all characters of "Letter" categories defined by Unicode (are all categories starting with "L" from this list). One detail is that regex needs flag u
for this shortcut to work.
This makes the regex a little more comprehensive as it will consider letters from other alphabets as well, such as Japanese, Arabic, Cyrillic, etc. On the other hand, it does not consider digits and the _
.
Another option to accept only letters of our alphabet, ignoring others like Japanese, Arabic, etc., is:
nome.match(/\b\p{Script=Latin}+\b/gu)
I can understand perfectly, thank you for your help.
– Patrique