Check for special characters in a string using Regexp

Asked

Viewed 12,531 times

8

I need to validate a password field that must contain at least one special character. Is there any Regexp expression to do this check?

E.g.

var patt = [a-z]; // verifica a existência de letras minúsculas 
var patt = [A-Z]; // verifica a existência de letras maiúsculas
var patt = [0-9]; // verifica a existência números

2 answers

10

The pattern \W will select all characters that are not alphanumeric, that is, select all characters except letters (A-Z), numbers (0-9) and underline (_).

So we can build a pattern. So:

const pass1 = 'senhafraca';
const pass2 = '$3nh@f0rt3';

const regex = /\W|_/;

console.log(regex.test(pass1));
console.log(regex.test(pass2));

In the pattern /\W|_/, we have:

  • \W, searching for non-alphanumeric characters;
  • |, which is the same as a "or", in regular expressions.
  • _, searching for all characters _, since \W does not select underline characters.

I made an example in Regexr.

  • Thank you so much for your tip, Luiz.

9


Only to complement the another answer, would like to give another alternative to the solution.

You said you want to validate a password field and allow it to have special characters (plus letters and numbers, which I understood should also be accepted).

When using \W, You will actually accept such special characters (anything that is not alphanumeric), but you may end up accepting too much that you might not. Let’s understand better by seeing in detail what actually means the \W.


Basically, the \W is a shortcut which means "anything that is not \w" (notice the difference between W uppercase and w lower case). And the \w means "letters of A to Z (upper or lower case) or digits of 0 to 9 or the character _".

Therefore, \W means "anything other than the letter of A to Z, digit of 0 to 9 or _". And that’s the problem, because it’s all even!

The \W can even pick up the special characters (including those generally accepted in passwords, such as $ and @), but will also accept many others who maybe should not be accepted in a password field. Just to quote a few examples:

* accept these characters would not be a problem, I put an observation at the end

For all these cases, the regex /\W|_/ returns true, as we can see in the examples below:

let regex = /\W|_/;

// espaço (https://www.fileformat.info/info/unicode/char/0020/index.htm)
let space = " ";
console.log("caractere: [" + space + "]");
console.log(regex.test(space)); // true

// TAB (https://www.fileformat.info/info/unicode/char/0009/index.htm)
let tab = "\t";
console.log("caractere: [" + tab + "]");
console.log(regex.test(tab)); // true

// \0 (https://www.fileformat.info/info/unicode/char/0000/index.htm)
let asciizero = "\u0000";
console.log("caractere: [" + asciizero + "]");
console.log(regex.test(asciizero)); // true

// caractere japonês (http://www.fileformat.info/info/unicode/char/337B/index.htm)
let jp = "\u337b";
console.log("caractere: [" + jp + "]");
console.log(regex.test(jp)); // true

// emoji (https://www.fileformat.info/info/unicode/char/1f4a9/index.htm)
let emoji = "\u{1F4A9}";
// se a linha acima não funcionar (browsers mais antigos), troque por "\uD83D\uDCA9"
console.log("caractere: [" + emoji + "]");
console.log(regex.test(emoji)); // true

// copyright (https://www.fileformat.info/info/unicode/char/00a9/index.htm)
let cr = "\u00a9";
console.log("caractere: [" + cr + "]");
console.log(regex.test(cr)); // true


// mais alguns exemplos
console.log(regex.test('^')); // true
console.log(regex.test('ñ')); // true
console.log(regex.test('௸')); // true

The case of space and TAB can happen if the user copy and paste the password of some other program/editor and comes some of these characters "hidden" in the string. This also applies to a number of other characters that are not visible, but can be in the string without us knowing. Unicode defines several different characters for space, for example - and \W accepts all of them (in addition to line breaks and others control characters, and many others).
It’s up to you to decide if you want to accept a password like a$ 123.

This can be minimized if you validate the password again on the server, before saving it to the bank (but using the same regex with only \W, it won’t do any good, because it will also pass these characters).

Emojis, Japanese/Arabic characters/etc and symbols like the ©, well, you decide whether to accept them or not. If these are allowed characters in your password, fine. But if these characters should not be accepted, use \W is not the best alternative.

I’m not saying that the response that suggested \W is wrong (I even voted for it because it does what was requested). It actually validates what you need (special characters - anything other than letter or number). The problem is when she starts validating things that maybe she shouldn’t.

Use \W can fall in this case: she does what you want (takes everything that is not letter or number), but does more than that: takes other characters that you may not want. Anyway, if it’s okay for you to accept emojis, invisible characters, mathematical symbols, accents and many others characters defined by Unicode, which are not letters of A to Z nor digits of 0 to 9, can use \W. But if you want to limit the characters to only those that your passwords can have, you can continue reading, because we will use a regex a little more complex, but less prone to these false positives.


Alternative to \W

First you should define which special characters will be considered, because the definition "special characters" is a little vague. There are many characters that can be called "special" (and if we consider "everything that is not alphanumeric", we have seen that the result is too comprehensive).

For example, ^ (Circumflex accent, without any letter) can be considered? And if it is a letter with accent: ñ? The hyphen? The comma? Parentheses? Punctuation marks? Each website has its own rules of what is allowed or not allowed in a password, so "special characters" is something that varies greatly for this context. So the first thing to do is to define exactly what a special character is in your case.

Once the list of special characters is defined, one solution is to place all these characters between square brackets. Square brackets define a character class and make regex accept any character within them. For example, [@#] means "the character @ or the character #.

Then I could have something like [@!#$%^&*()/\\], which corresponds to any of the characters between [ and ]. Note that for the character \ I had to put \\, for the \ alone serves to escape characters that have special meaning in regex. If I put only one \, I’d be escaping the ] and it would be interpreted as the "close-bracketed" character itself and not as the delimiter of a character class.

One detail is that the whole expression [@!#$%^&*()/\\] corresponds to only one character (which could be anyone inside the brackets). If I use this regex, I guarantee that there are some of these characters in the string.

This list is just an example, you should add the characters you need in it. I will use the one I set above, just to show the general idea of regex, but you should change it to include all the special characters your password may have. It can be more laborious, but at least you guarantee that only these characters will be accepted.

Remembering that the characters that have special meaning must be escaped with \. For example, if I want the brackets to be accepted, I have to do [\[\]] - the first [ delimits the character class, and then \[ means the character itself [, \] means the character ] and finally ] closes the character class). Another thing is that inside the brackets, not all need to be escaped (like the $, for example) - one way to test is to use websites such as regex101 and choose "javascript" from the left menu, and see if regex is correct.

From what I understand, your passwords must have uppercase, lowercase and number letters, and must have a special character. One way to check is by using lookaheads, whose syntax is (?=expressão). In this case, we want an expression that checks if the string has any of the special characters, so just use ^(?=.*[@!#$%^&*()/\\]) - the bookmark ^ means "string start", ensuring that I will check from the beginning of the password. Next, .* means "zero or more occurrences of any character", and then we have the list of special characters that I defined.

That is, the passage .*[@!#$%^&*()/\\] checks if there is any special character in the string. If there is none, the expression fails and the regex does not return any match. If there are any, regex continues to evaluate the string.

The trick of Lookahead ((?=...)) is that he checks the expression inside him, and if she finds any match, he goes back to where he was and continues to evaluate the rest of the regex. In this case, he is just after the ^, then it goes back to the beginning of the string.

Then I put everything the password can have. If it can have the special characters, just repeat them. If you can have letters or numbers, put them all together in brackets too: [@!#$%^&*()/\\a-zA-Z0-9]. Then you can put some quantifier to set the password size.

For example, [@!#$%^&*()/\\a-zA-Z0-9]{8,20} causes regex to accept only strings from 8 to 20 characters (and these can be anyone in the brackets: a special character, a letter, or a number). If you want to limit the minimum size only, use {8,} (minimum 8, maximum undetermined). Adjust to the values you need.

Then you put the marker $, meaning "end of string". So you ensure that from start to finish, the string only has what is specified in regex.

Then our example would be:

let regex = /^(?=.*[@!#$%^&*()/\\])[@!#$%^&*()/\\a-zA-Z0-9]{8,20}$/;

// tem caracteres especiais
console.log(regex.test('a@4DFFd$fd')); // true

// tem caracteres especiais, mas tem menos de 8 caracteres
console.log(regex.test('a@4$f')); // false

// não tem caracteres especiais
console.log(regex.test('abcDE123456')); // false

In short:

  • the Lookahead ensures that there is at least a special character (using the preset list you can adjust to include the characters you want). If found, it goes back to the beginning of the string and evaluates the rest of the regex
  • the remainder ensures that the string will only have special characters, or letters, or numbers
  • the start and end markers of the string ensure that the string only has the specified characters (if you do not use them, the string may have characters that are not part of the regex and yet test will return true).

regex will also accept strings like $$$$$$$$$, then it is up to you to check whether this is acceptable, or whether it will include more checks.


If you want to have more conditions, for example "you must have at least one letter and one digit", just add more lookaheads:

let regex = /^(?=.*[@!#$%^&*()/\\])(?=.*[0-9])(?=.*[a-zA-Z])[@!#$%^&*()/\\a-zA-Z0-9]{8,20}$/;

// tem pelo menos um caractere especial, uma letra e um número 
console.log(regex.test('a@4DFFd$fd')); // true
console.log(regex.test('abc123$ABC')); // true

// tem menos de 8 caracteres
console.log(regex.test('a@4$A')); // false

// tem caracteres especiais, mas não tem número
console.log(regex.test('abcDE$#fadfda')); // false

Each Lookahead ((?=...)) ensures that the string has at least one specified character. The first has the list of special characters, the second has a digit ([0-9]) and the third has a letter ([a-zA-Z]). With this, I guarantee that the string has at least one character of each of these types. If you want to force at least one uppercase and one lowercase letter, just separate the [a-z] and [A-Z], making a Lookahead for each.

And just to confirm that this regex does not accept emojis, Japanese characters, spaces, etc:

let regex = /^(?=.*[@!#$%^&*()/\\])(?=.*[0-9])(?=.*[a-zA-Z])[@!#$%^&*()/\\a-zA-Z0-9]{8,20}$/;

// concatenar as strings abaixo com esta, para ter pelo menos 8 caracteres
let s = "abc123$ABC";

// espaço (https://www.fileformat.info/info/unicode/char/0020/index.htm)
let space = " " + s;
console.log("senha: [" + space + "]");
console.log(regex.test(space)); // false

// TAB (https://www.fileformat.info/info/unicode/char/0009/index.htm)
let tab = "\t" + s;
console.log("senha: [" + tab + "]");
console.log(regex.test(tab)); // false

// caractere japonês (http://www.fileformat.info/info/unicode/char/337B/index.htm)
let jp = "\u337b" + s;
console.log("senha: [" + jp + "]");
console.log(regex.test(jp)); // false

// emoji (https://www.fileformat.info/info/unicode/char/1f4a9/index.htm)
let emoji = "\u{1F4A9}" + s;
// se a linha acima não funcionar (browsers mais antigos), troque por "\uD83D\uDCA9"
console.log("senha: [" + emoji + "]");
console.log(regex.test(emoji)); // false

// copyright (https://www.fileformat.info/info/unicode/char/00a9/index.htm)
let cr = "\u00a9" + s;
console.log("senha: [" + cr + "]");
console.log(regex.test(cr)); // false


About emojis

Specifically on the emojis, it is no problem to accept them in your password as long as your code handles it correctly.

There are many applications that already accept emojis in passwords, and even studies saying that that makes them safer and "fun". Of course your application should be prepared to work properly with Unicode, which Javascript is not. Just look at the character counters in text fields, for example. Right here in Stack Overflow, the comment field considers emoji to be 2 characters.

Before typing the emoji:

Campo de comentários dizendo que faltam 585 caracteres

After typing the emoji:

Após digitar um emoji, agora faltam 583 caracteres. O emoji foi contado como 2 caracteres

This happens basically because thereof.

But fortunately they are saved and shown correctly, so in this particular case, this size "problem" is (in my opinion) a minor problem. This is only to show the problems you would have if you used Javascript to show the user that the password is not the accepted size, for example.

If your application has the same problem (considers that codepoints above U+FFFF are 2 characters), check in your language documentation how to resolve.

Anyway, this is already a separate issue (and you can ask other questions about it if that’s the case), but it’s just to make it clear that you can yes use emojis in passwords. The biggest problem for passwords is the non-printable characters and some other details that we have to take into account when allowing any Unicode character.

To accept emojis (and not accept control characters, spaces, etc), the regex gets a little more complicated, as you can see in this article and in this also. It is also worth noting that not every emoji equals only one Codepoint

  • 1

    Wow, excellent @hkotsubo response! Your answer was worth a lot more than the hundreds of tutorials I found on the internet. I’m new to Javascript, so I had this doubt about regex, which is a topic considered difficult even among programmers a little more experienced in this language.

  • 1

    @Edinaldoribeiro Thank you! Yes, regex is an endless subject, and much more complex than it seems (the more I study, the more I realize I knew nothing). Two sites that I find very good are this and this, because in addition to tutorials, they have some practical cases and explain in detail how it works, in addition to putting differences between languages (because it is, not enough regex already be difficult, still have to know how each language implements)

  • 1

    Each Lookahead checks a condition. The first one checks if it has any special character somewhere in the string. The second checks for any digits, and the third checks for any letters. It is a way to force the string to have at least one special character, a letter and a number. Without the second and third Lookahead, the string could only have special characters, for example.

  • Got it. After the three lookaheads there is still the expression [@!#$%^&*()/\\a-zA-Z0-9]. What would be the need for it since the string would already be validated by lookaheads?

  • 1

    The Lookahead only checks if there is any character of that type, and if it finds it, it "jumps" to the next section of the regex (it only served to verify the existence of that type of character, a way of obliging to have at least one). The difference is that [@!#$%^&*()/\\a-zA-Z0-9] does not require anything (if it has only letters or only numbers, for example, it accepts). And also serves to put the amount {8,20} - i didn’t put the amount in Lookahead pq it serves to check if there is at least one. Although it seems redundant, they are different purposes

  • @Edinaldoribeiro I updated the answer with some more information. I had not put it before because I thought I was going to deviate a little from the subject (because then I would have to start talking about the details of Unicode, etc), but in the end I thought it important to put at least some comments on

  • 1

    Dude, I was having a hard time understanding Regexp, but after that explanation, I was able to solve an important issue here. Hug and keep it up ^^

Show 2 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.