What does the regular expression "/(?=(?:...)*$)/" do in detail?

Asked

Viewed 1,173 times

19

I just needed a solution to put points to separate numbers from three to three, backwards.

For example:

1000 => 1.000
100000 => 100.000
10000000 => 1.000.000

In a reply I found in English Stackoverflow, the solution passed was to use this regular expression:

function separarDeTresEmTres(numero)
{
  return String(numero).split(/(?=(?:...)*$)/ ).join('.');
}


console.log(separarDeTresEmTres(1000));  
console.log(separarDeTresEmTres(1000000));
console.log(separarDeTresEmTres(10000000));

But I didn’t quite understand what the magic was made by /(?=(?:...)*$)/.

What is causing this regular expression to separate the numbers from three to three, backwards? What is the explanation?

NOTE: I don’t want answers explaining how to separate a number from three to three, because the regular expression I’m using is already doing that. The question here is specifically about how each part of this regular expression works. I do not want the solution of the problem without the explanation of what is happening.

  • 4

    I usually use this tool to ask questions: https://regex101.com/ maybe it will help something.

  • @Filipemoraes good tip, I’ll take a look now!

  • 1

    "In a reply I found in Stackoverflow English" - Could put the link?

  • @Victorstafusa edited :p

4 answers

18

Let’s go by part:

/(?=(?:...)*$)/
  • ?= Will capture the space followed by the expression after the =.
  • ?: Sets the entire expression within parentheses in a non-sampling group.
  • ... Any character 3 times.
  • *$ Repeating several times at the end of the string.

Explaining in practice:

What happens is this, this expression groups 3 characters: (?:...), and makes the catch before them: ?=, which ensures that this is done back to front endlessly times is the: *$, applying the split the number 1000000 would be divided like this: 1|000|000 .join('.') that the magic is done.

OBS.: The no capture group ?: serves to not disturb when capturing what really matters, which is before the 3 characters.

  • 1

    Great, that does answer the question. + 1

  • I hope it’s clear, it’s a little complicated to explain that, if you have any questions that I can answer, we’re there!

18


Well, let’s build that regular expression:

  • . - Recognizes any character.

  • ... - Recognizes any three characters.

  • (?:...) - Group without capture of any three characters. Groups without capture are started by (?: and ended by ).

  • (?:...)* - Repetition. That * indicates 0 or more repetitions. So this is several groups of three characters.

  • $ - End of the string. By ensuring that the end of the string is present, it is ensured that no character can be left over at the end.

  • (?:...)*$ - Groups of three characters followed by the end of the string. This ensures that the recognized groups must be at the end of the string, not at the beginning.

  • (?=(?:...)*$) - Lookahead positive - Forces recognition of what follows and searches all places where the following expression matches something.

To understand this last point, let’s assume that the expression was (?=a(?:...)*$) and the chain of entry was 1234a567890. In this case, the value recognised in the term internal to (?=-) would be the a567890, for that would be a a followed by a number of multiple characters of 3, but the recognition of the entire string captured by Lookahead positive (in the case of full entry) is forced anyway. Note that the recognition of the whole happened even if the beginning of the next string did not enter the recognized part - this is what the Lookahead positive makes. Recognition also occurs several times because the regex inside the (?=-) is recognized in several different locations - each string of multiple length of three characters stuck at the end of the string (including 0) preceded by anything is recognized.

  • The / before and after regex is what Javascript uses to denote and delimit regex.

  • The String(numero) is a way to convert a number into a string.

  • The method split punctures the string where the recognition occurs, so that it will end up slicing the string from three to three characters from the end to the beginning because several different possibilities have been recognized and create with it an array of strings.

  • The join is an array method that joins all pieces into a single string by placing a separator between each piece, the result of which is returned with the return.

9

This REGEX is very interesting because it combines some interesting factors.

Factors

  • We know that REGEX is used to capture certain content of a text/string.
  • We also know what split divided by the occurrence.

Examples

var test = 'Teste de captura';
var r = /c.p/
console.log(test.match(r));

var test = 'Teste de divisão';
var div = 'e';
console.log(test.split(div));

Note that in the split the division character is lost.

What’s going on?

This function is uniting these two particularities.
The big question is: "what he’s using to capture and divide"?
The answer is : The "nothingness".

Now you must be asking yourself "How so nothing?"

What is the "nothingness"

In this answer I touch a little on what is the nothingness.

in compilers would be the same as a direct transition to the next stage

How he does it

Through the part of REGEX (?= ). This creates a capture that should not go to the result.

General Explanation of REGEX

  • $ as you set an end, this will alter the default behavior of REGEX and make it "start" at the end by marrying the content back to front.
  • (?: )* - It only says that it is a group that can repeat itself infinitely, but it should not be counted.
  • (?= ) - It is a catch that should occur but not go to the result.
  • ... - Sequence of 3 characters either.

And where is the nothingness in all this?

In the fact that you don’t have a capture, you have a sequence of 3 characters that goes from the end to the beginning. And it’s dividing by transition for the next stage of 3. It would be the gap between the two.

-2

Already now a variant of the same idea, using the same philosophy -- this time a simple substitute in Perl (to emphasize that this question is orthogonal to the programming language)

$ echo "1000 e mais 100000 10000000" |
      perl -pe 's/\d(?=(\d\d\d)+\b)/$&./g'
1.000 e mais 100.000 10.000.000

That is: a digit "d" followed by groups of 3 digits, is replaced by "d." As usual:

  • \b : word frontier
  • \d : digit
  • (?= regexp) : zero-width Lookahead (right context)
  • s/regexp/string/g : find-replace global
  • $& : the string that matched

Already support, applying directly to the (simplest) case of the question, in javascript would be:

function separarDeTresEmTres(numero)
{
  return String(numero).replace(/\d(?=(...)+$)/g,"$&.");
}
  • 2

    -1 This does not answer what was asked.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.