How can I make regex recognize "partial results" in Javascript to mask an input?

Asked

Viewed 76 times

0

I’m making a <input> with mask, formatting as user type. In this case, it is a CNPJ entry.

A Regex to format the CNPJ would be /(\d{2})\.?(\d{3})\.?(\d{3})\/?(\d{4})-?(\d+)/g, but so it format only when Regex is completely satisfied (with all number groups having some non-empty value). Run the code below to view this situation.

const input = document.querySelector('#cnpj');

function onTextChanged(e) {
  const text = e.target.value;
  const pureText = text.replace(/[^0-9]/g, '');
  const textMasked = pureText.replace(
      /(\d{2})\.?(\d{3})\.?(\d{3})\/?(\d{4})-?(\d+)/g,
      '$1.$2.$3/$4-$5'
    );
  document.querySelector('#cnpj').value = textMasked;
}

input.addEventListener('input', onTextChanged);


// Automatizando a entrada de dados para exemplificação aqui no Snippet
// Código adaptado de https://stackoverflow.com/a/47617675/8839059
const time = 150;
let current = 0;
let cnpjText = '12345678000123'

function writeText() {
  const newValue = input.value + cnpjText[current];
  const ev = new Event('input');

  input.value = newValue;
  input.dispatchEvent(ev);
  if (current < cnpjText.length - 1) {
    current++;
    setTimeout(writeText, time);
  } else {
  }
}
setTimeout(writeText, time);
<input id="cnpj" maxLength="18" />

I can handle this situation by putting some if's for the "parts" of Regex:

const input = document.querySelector('#cnpj');

function onTextChanged(e) {
  const text = e.target.value;
  const pureText = text.replace(/[^0-9]/g, '');
  let textMasked = '';
  if (pureText.length <= 2) {
    textMasked = pureText.replace(
      /(\d{2})/g,
      '$1'
    );
  } else if (pureText.length <= 5) {
    textMasked = pureText.replace(
      /(\d{2})\.?(\d+)/g,
      '$1.$2'
    );
  } else if (pureText.length <= 8) {
    textMasked = pureText.replace(
      /(\d{2})\.?(\d{3})\.?(\d+)/g,
      '$1.$2.$3'
    );
  } else if (pureText.length <= 12) {
    textMasked = pureText.replace(
      /(\d{2})\.?(\d{3})\.?(\d{3})\/?(\d+)/g,
      '$1.$2.$3/$4'
    );
  } else {
    textMasked = pureText.replace(
      /(\d{2})\.?(\d{3})\.?(\d{3})\/?(\d{4})-?(\d+)/g,
      '$1.$2.$3/$4-$5'
    );
  }
  document.querySelector('#cnpj').value = textMasked;
}

input.addEventListener('input', onTextChanged);


// Automatizando a entrada de dados para exemplificação aqui no Snippet
// Código adaptado de https://stackoverflow.com/a/47617675/8839059
const time = 150;
let current = 0;
let cnpjText = '12345678000123'

function writeText() {
  const newValue = input.value + cnpjText[current];
  const ev = new Event('input');

  input.value = newValue;
  input.dispatchEvent(ev);
  if (current < cnpjText.length - 1) {
    current++;
    setTimeout(writeText, time);
  } else {
  }
}
setTimeout(writeText, time);
<input id="cnpj" maxLength="18" />

What I’d like to know is how to do something more generic to not need so many if's.

1 answer

2


In your job you do replace(/[^0-9]/g, ''), which already deletes all characters that are not numbers. This means that in regex you do not need to have \.?, nor \/? much less -?, as characters that are not digits have already been removed, so putting anything else in regex becomes redundant and unnecessary.

That being said, the solution with regex is not very "pretty" (even at the end has another without regex, which I think is better for this case), because it involves checking if each of the parts is in the string. That is, the string can be:

  • only 2 digits
  • 2 digits, followed by 1 to 3 digits (in which case I put a dot after the second digit)
  • 2 digits, followed by 3 digits, followed by 1 to 3 digits (in which case I put a dot after the second digit and another dot after the fifth digit)
  • and so on...

That is, if I were to check only the first and second cases above, the regex would be something like /(\d{2})(\d{1,3})?/ - the stretch \d{1,3} (1 to 3 digits) is optional (or \d{0,3}, for example).

To check also the third case, would be /(\d{2})(?:(\d{1,3})(\d{1,3})?)?/ - now I have (\d{1,3})(\d{1,3})? (1 to 3 digits followed optionally by 1 to 3 digits) - and this whole stretch is also optional. Detail that I now used a catch group (the stretch (?:etc)), so I don’t create any more groups (which are the variables $1, $2, etc, which you use in replace). If I didn’t use (?: in parentheses, another random group would be created.

Applying this logic to all cases, would this monster:

^(\d{2})(?:(\d{1,3})(?:(\d{1,3})(?:(\d{1,4})(\d{1,2})?)?)?)?$

I also use the markers ^ and $ (string start and end), to ensure that the string has only what is specified by regex (neither more character nor less).

And to make the substitution, I use a function of callback, which takes all capture groups as a parameter. Thus, I can check if the group was found by regex (as they correspond to optional parts, they will not always have a value). It would look like this:

const input = document.querySelector('#cnpj');

let regex = /^(\d{2})(?:(\d{1,3})(?:(\d{1,3})(?:(\d{1,4})(\d{1,2})?)?)?)?$/;

// de g1 a g5 são os grupos de captura
function replacement(match, g1, g2, g3, g4, g5) {
    let s = '';
    // se o grupo está presente, adiciona na string
    if (g1) s += g1;
    if (g2) s += `.${g2}`;
    if (g3) s += `.${g3}`;
    if (g4) s += `/${g4}`;
    if (g5) s += `-${g5}`;
    return s;
}

function onTextChanged(e) {
  const text = e.target.value;
  const pureText = text.replace(/[^0-9]/g, '');
  let textMasked = pureText.replace(regex, replacement);
  document.querySelector('#cnpj').value = textMasked;
}

input.addEventListener('input', onTextChanged);

const time = 150;
let current = 0;
let cnpjText = '12345678000123'

function writeText() {
  const newValue = input.value + cnpjText[current];
  const ev = new Event('input');

  input.value = newValue;
  input.dispatchEvent(ev);
  if (current < cnpjText.length - 1) {
    current++;
    setTimeout(writeText, time);
  }
}
setTimeout(writeText, time);
<input id="cnpj" maxLength="18" />

Anyway, see that you can not escape the "mount of if's", because you need to check if a group is present to know whether or not to place separators (hyphen, dot or bar).


But does it really need this complicated regex? You could for example delete characters that are not numbers and format the value using good old slice:

const input = document.querySelector('#cnpj');

function onTextChanged(e) {
  const text = e.target.value;
  const pureText = text.replace(/[^0-9]/g, '');
  let textMasked = pureText.slice(0, 2);
  if (pureText.length > 2) {
    textMasked += '.' + pureText.slice(2, 5);
  }
  if (pureText.length > 5) {
    textMasked += '.' + pureText.slice(5, 8);
  }
  if (pureText.length > 8) {
    textMasked += '/' + pureText.slice(8, 12);
  }
  if (pureText.length > 12) {
    textMasked += '-' + pureText.slice(12);
  }
  document.querySelector('#cnpj').value = textMasked;
}

input.addEventListener('input', onTextChanged);

const time = 150;
let current = 0;
let cnpjText = '12345678000123'

function writeText() {
  const newValue = input.value + cnpjText[current];
  const ev = new Event('input');

  input.value = newValue;
  input.dispatchEvent(ev);
  if (current < cnpjText.length - 1) {
    current++;
    setTimeout(writeText, time);
  }
}
setTimeout(writeText, time);
<input id="cnpj" maxLength="18" />

Maybe you don’t think the solution is good by "using multiple if's", and would even give to generalize, creating an array containing the positions in which is made the slice and the character used to separate the parts - something like this:

let sliceData = [ {size: 2, sep:''}, {size: 3, sep: '.'}, {size: 3, sep: '.'}, {size: 4, sep: '/'}, {size: 2, sep: '-'}];
let textMasked = '';
let index = 0;
for (const s of sliceData) {
    textMasked += s.sep + pureText.slice(index, index + s.size);
    index += s.size;
    if (pureText.length <= index) break;
}

But I honestly don’t think you need this complication if the only goal is to get rid of some if's.

  • 1

    Great answer. In fact there was "garbage" in Regex because I was trying many different things and ended up going unnoticed. In fact, I was looking around in that reply of Soen who talked about partial matching, was wanting to "generalize" to create a component that would accept any type of mask, but I realized that it would not be worth the benefit cost indeed, would be Regex very complex and maybe slow to perform (I did not analyze the performance)

  • 1

    @Rafaeltavares In general regex is slower, since the expression needs to be compiled (that’s right, compiled!), an internal structure is generated that the engine will use to execute, the Matches and capture groups have to be generated, what consumes memory, etc. Of course for few small strings, the difference will be inconspicuous, but how to test for your use cases and see if it gets to be a performance problem (and it is worth assessing the cost of maintenance, since the regex of the answer is not so easy to understand and maintain, in my opinion)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.