Regular expressions to fix the format of a field as it is typed

Asked

Viewed 255 times

-2

I need help with 2 regular expressions in Javascript to validate two forms of data entry in an HTML input.

1) The first rule is for RNE following:

    RNE -> Fixo;
    A-Z, 0-9 -> tamanho 1;
    0-9 -> tamanho 6;
    A-Z, 0-9 -> tamanho 1;
    Exemplos: RNEA409714Z ou RNEZ4097140 ou RNE04097149

2) Second rule is for RG (Portuguese issued here in Brazil), it started with the letter W and may end with a number or letter, being:

    W -> Fixo
    0-9 -> tamanho 7;
    A-Z, 0-9 -> tamanho 1;
    Exemplos: W0842241A ou W08422410

The idea is to get characters that are not part of the 2 rules ignored or removed from input.

My current code is:

function formatarRG(input, teclapres) {
	var numero = input.value.trim().toUpperCase();

	if (numero.substring(0, 1) == 'R' || numero.substring(0, 2) == 'RN' || numero.substring(0, 3) == 'RNE')
		input.value = numero.replace(/^(RNE)([A-Z\d])(\d{6})([A-Z\d])$/, '$1$2$3$4');
	else if (numero.substring(0, 1) == 'W')
		input.value = numero.replace(/^(W)(\d{7})([A-Z\d])$/, '$1$2$3');
	else
		input.value = numero;
		
	return input.value;
}
<html>

	<label>Número RG:</label>
	<input type="text" value="" maxlength="14" size="20" onKeyUp="return formatarRG(this, event);" />

<html/>

  • A tip: {1} is redundant and can be removed. \d{1} is the same as \d (by and large (qualquercoisa){1} is the same as (qualquercoisa)). In addition, the \w already includes digits, so if you have \w doesn’t need \d. Only that the \w also considers the character _, so you should actually use [A-Z\d] (letter from A to Z or digit from 0 to 9). Another detail is that the replace is not making much sense, because you replace everything you found by the same things, and in the end the string will be the same as it was before

  • Thank you very much for the guidelines, I’ll make the adjustments. The idea is to have a return with the really valid characters, already disregarding what is not part of the rule, so that I used the 'replace', you would have something to indicate ?

  • But to despise what is not part may be too broad. If "Wxyz.2#@0842241d-)(A" is typed, do you want me to delete everything that is not a part and only about "W0842241A"? It might be easier to say which format is valid and ask you to type again if it is not, instead of trying to correct what was typed, since there are too many possibilities to be treated

  • hkotsubo, not so much!!! you may have noticed that my HTML has onKeyUp="Return formatterRG(this, Event);" so in the example "W0842241A", I know that the first character typed has to be a W, the second to the eighth has to be number and the last (ninth) can be A-Z, 0-9. I didn’t want to keep riding substring and take Char Code from Event and disregard what’s not expected. I thought there’d be something cleaner using regex.

  • Could be clearer in your question, because in the body of the question is like "I need help with 2 regular expressions in Javascript ..." and your code works but down there in the @Felipealmeida answer you say your question is different. I will mark as a non-reproducible problem because for me it makes no sense a question that does not present error and when a user tries to answer it changes its scope.

  • Augusto my code doesn’t work as it should, that’s why I asked for help! You haven’t read through the 2 rules I need to implement, I figured someone with extensive knowledge of regex would be able to tell me if there is a regular expression that removes characters that don’t fit rules 1 and 2 and keeps only the characters that follow the rule, or even give me the proper guidelines to be able to use regex for such a purpose, because with substrings, ifs, charCode, I already have!

Show 1 more comment

2 answers

1


From what I understand, you want to go removing the invalid characters as these are typed. For example, if after the "W" you can only have digits, and the user type an "x", the string becomes "Wx", and in this case the "x" must be removed immediately after being typed, and the value of the input back to just "W".

Well, doing this with regex will be quite complicated as you have to evaluate all the possibilities for the replacement to be done properly:

  • when the first character is typed, you have to check if it is "W"
  • when the second character is typed (assuming the first one has already been checked to be "W"), you have to check if it is a digit
  • when the third character is typed (assuming the first 2 have already been checked), you have to check if it is another digit
  • and so on...

Just so the regex isn’t so confused, let’s assume the criterion is "W followed by 3 digits, followed by a letter". A possible solution would be:

function formatarRG(input, teclapres) {
    let numero = input.value.trim().toUpperCase();
    let r = /^W(\d(\d(\d[A-Z]?)?)?)?$/;
    while (! numero.match(r)) { // enquanto não estiver no formato correto, vai removendo caracteres do input
        numero = numero.slice(0, -1);

        // se já removeu tudo, pode sair do loop
        if (numero.length == 0) break;
    }

    input.value = numero;
}
<html>
  <label>Número RG:</label>
  <input type="text" value="" maxlength="14" size="20" onKeyUp="formatarRG(this, event);" />
<html/>

The regex is ^W(\d(\d(\d[A-Z]?)?)?)?$. She uses the markers ^ and $, that indicate the beginning and end of the string, so I guarantee that it can only have what is specified in the expression. Then we have the W. And then we have the tricky stretch, which checks all possibilities.

Basically, starting from the inside out:

  • \d[A-Z]?: a digit optionally followed by a letter (the ? indicates that something is optional)
  • \d(\d[A-Z]?)?: a digit optionally followed by "a digit optionally followed by a letter"
  • \d(\d(\d[A-Z]?)?)?: a digit optionally followed by the expression above
  • finally, the above expression is also optional

Thus, regex can check whether it has only the letter "W", or "W" followed by only one digit, or 2 digits, or 3 digits, or 3 digits plus one letter.

Could anyone suggest using ^W\d{0,3}[A-Z]?$ (the letter "W", followed by 0 to 3 digits, followed by an optional letter), but this regex does not serve, as it also takes cases such as "W1X" and "W12X" (with 1 or 2 digits before the last letter). Only the above regex ensures that it has 1, 2 or 3 digits, and that the last letter only occurs after the third digit. See the difference here and here.

Anyway, if something is typed that does not fit, the characters of the end are removed, until a valid string is reached.


Now try to imagine what the expressions would look like for your criteria. With 3 digits and a letter already was this business - in my opinion - confusing and difficult to maintain. It would be something like ^W(\d(\d(\d(\d(\d(\d(\d[A-Z\d]?)?)?)?)?)?)?)?$ - or checked to see if it’s right. For RNE, it would be something like ^R(N(E([A-Z\d](\d(\d(\d(\d(\d(\d[A-Z\d]?)?)?)?)?)?)?)?)?)?$. Both are difficult to understand and can become maintenance nightmares.


Use regex, but otherwise

I still find it easier for you to indicate in your interface which is the correct format, and if the user type something wrong, show an informative message:

const campo = document.querySelector('#rg');

campo.addEventListener('input', () => {
  campo.setCustomValidity('');
  campo.checkValidity();
});

campo.addEventListener('invalid', () => {
    campo.setCustomValidity('O formato do campo é blablablaetc (informe o formato correto nesta mensagem)');
});
/* deixar borda vermelha enquanto o campo for inválido */
input:invalid {
  border: red 1px solid;
}
<form>
  <label>Número RG:</label>
  <input id="rg" type="text" value="" required
   pattern="^(W\d{7}[A-Z\d]|RNE[A-Z\d]\d{6}[A-Z\d])$" size="20" />
  <input type="submit" value="ok">
<form/>

The attribute pattern has a regex indicating the format the field should have: it can be a RG (starting with "W") or a RNE (the character | indicates alternation: an option or other). If the entered value is wrong (i.e., it does not match regex), the CSS rule input:invalid is applied (and then you can style the field the way you think best, for example, to indicate that the format is wrong).

This solution does not prevent the user from entering something invalid, nor does it automatically correct the value. But when trying to submit the form, the corresponding message, defined by setCustomValidity.

Anyway, once you know whether what was typed is right or not, you create your way to inform the user what is wrong and how to fix it.

  • hkotsubo, first congratulations on the explanation; level of detail and very impressive clarity! Really a class, it seems like you’re next door explaining... Show! I want to thank you for the response which helped me so much to make a decision as to where to go. A strong hug.

1

I did the regex here on www.regex101.com and created the following regex

Regex1: RNE[A-Z0-9][0-9]{1,6}[A-Z0-9]

Regex2: W[0-9]{7}[A-Z0-9]

I tested with the examples you passed and worked perfectly!

  • grateful!! Yes, both your regex and mine in the validation will pass, but the idea is: at the time of typing in IMPUT, despise the characters that are not part of the rule and keep the ones that actually do...

  • Felipe Almeida, his code works making no sense to answer for a problem that does not exist.

  • Unfortunately you are wrong, just click on the "Run" button up there and do a test, type anything that does not follow rule 1 and 2, you will see that my code in Function formatrRG(input, keystrokes), does not work as it should. Even fellow hkotsubo in his reply put an example "Wxyz.2#@0842241d-)(A" and wanted that using regex elegantly, the return was "W0842241A", you know, ?

Browser other questions tagged

You are not signed in. Login or sign up in order to post.