Regex in the input frame always valid

Asked

Viewed 87 times

1

I have a form as long as the visitor can enter the phone and am trying to valid using the input Pattern attribute using a regular expression I found in a post on Medium:

^(?:(?:\+|00)?(55)\s?)?(?:\(?([0-0]?[0-9]{1}[0-9]{1})\)?\s?)??(?:((?:9\d|[2-9])\d{3}\-?\d{4}))$

I tested the expression using regex101, as can be seen in the link below, and it works. Even, I’m using the same expression to valid the phone in the backend.

https://regex101.com/r/10yNjW/1

The problem is that for some reason, that I can not understand, is not working on my site, that is, any value entered in the field where should be the phone is given as valid.

<form action="" method="post">
    <label for="phone">Seu telefone</label>
    <input type="text" name="phone" id="phone" pattern="^(?:(?:\+|00)?(55)\s?)?(?:\(?([0-0]?[0-9]{1}[0-9]{1})\)?\s?)??(?:((?:9\d|[2-9])\d{3}\-?\d{4}))$">

    <button type="submit">Enviar</button>
</form>
  • This regular expression is used to "search", existing or not, is not to be "strict", that is, in the example that the author in the medium has left evident that serves to search in texts the number, if it does not exist returns nothing, in your case the regular expression has to be simpler and stricter, the good would be even you apply a mask scheme to standardize the phone, suggestions: https://answall.com/q/51109/3635

  • For the record, whoever wrote this Medium article doesn’t seem to know exactly what they’re doing. First he removes the parentheses, but in regex he puts \(? and \)? (which checks for optional parentheses - which is unnecessary since it removes all the parentheses in the previous line). Not counting the excerpt [0-0]?[0-9]{1}[0-9]{1} (that I comment on in answer below how to simplify), which seems to me typical of those who copied-glued without understanding (or tried to do without knowing much what was doing). Of course I could be wrong, but that’s the impression I got...

1 answer

0


Short answer

The problem is in the hyphen, in the final stretch (\d{3}\-?\d{4}). Instead of \-, simply put - (that is to say, \d{3}-?\d{4}).


Long answer

But why it worked in regex101?

According to the documentation, the regex that is in the attribute pattern is compiled with the flag u (Unicode) (for more details on this flag, read here). And in unicode mode the hyphen cannot be preceded by \ (unless you are in a character class). So, as it stands, regex becomes invalid, and when regex is not valid, the attribute pattern is ignored (and so he ends up accepting anything).

It only worked in regex101 because there was no unicode mode used. See the difference:

// sem a flag u - funciona
let semUnicode = /\d\-\d/;
console.log('válido:', semUnicode);

// com a flag u - erro
let comUnicode = /\d\-\d/u;
console.log('Não vai imprimir esta mensagem porque dá erro na linha acima');

So much so that if you set flag u in regex101 (in the upper right corner, just after the field where you type the regex, in the "flag" icon), regex also gives error (see here). In this case it only works if we take the \ before the hyphen (see).

Note: as already said, the hyphen should only be escaped with \ in places where it has special meaning (i.e., within brackets, which demarcate a class of characters - for example: [a-z] is the interval between "a" and "z" (the hyphen serves to denote an interval), and [a\-z] is "the letter a, or a hyphen, or the letter z").


There are also other improvements to do in this regex. As you just want to validate, you don’t need the parentheses to create the capture groups (just need to keep those that serve to group some excerpts of the expression).

And the stretch [0-0]?[0-9]{1}[0-9]{1} can be simplified:

  • [0-0] means "a character between 0 and 0", that is, it can be exchanged by itself 0
  • [0-9]{1} means "an occurrence of a digit from 0 to 9", and as this appears twice in a row, it can be changed to \d{2} (two one-digit occurrences from 0 to 9)
    • in general, use {1} is never necessary, because (qualquer coisa){1} is the same as (qualquer coisa)

Then it would look like this:

const campo = document.querySelector('#phone');
campo.addEventListener('input', () => {
    campo.setCustomValidity('');
    campo.checkValidity();
});
campo.addEventListener('invalid', () => {
    campo.setCustomValidity('O campo deve ser um telefone válido');
});
<form action="" method="post">
    <label for="phone">Seu telefone</label>
    <input type="text" name="phone" id="phone"
     pattern="^(?:(?:\+|00)?55\s?)?(?:\(?(0?\d{2})\)?\s?)??(?:9\d|[2-9])\d{3}-?\d{4}$">

    <button type="submit">Enviar</button>
</form>

Of course, there is still room for improvement, since regex accepts 00 as DDD (maybe it should be [1-9]{2}, for at present no DDD has zero digit). If you want, you can take a look in this question, that has several other options.


But how have already said in the comments, It may be better if the field only accepts numbers and you apply a mask to make it easier for users to fill in. I think it would be better than accepting several different formats (think about the maintenance of this code, it was not an easy to understand and maintain regex).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.