The idea of proposal is to return the initial and final index of the match found, and also the capture groups, when present.
Before this proposal, by using RegExp.prototype.exec
, String.prototype.match
or String.prototype.matchAll
, the most we had was the initial index in which the match is found. That is, in this code:
const s1 = 'zabbcdef';
const m1 = s1.match(/ab*(cd(?<Z>ef)?)/);
for (const e in m1) {
console.log(e, m1[e]);
}
The result (the array m1
) owns a property index
, indicating the index on which the match starts (in this case, it is 1
, the position of the string where the a
). He also has the property groups
, which contains the named groups (one appointed group - in this case, is the (?<Z>ef)
, indicating that the content ef
is part of the group whose name is "Z").
In the array m1
the capture groups themselves are also returned (cdef
and ef
), but there is no information about its contents.
The idea of the proposal is to have the initial and final indexes of the match whole and also of each capture group. In the case of the above regex, we have 2 groups:
(?<Z>ef)
is a named group (its name is "Z", the content is ef
)
(cd(?<Z>ef)?)
is a group without name, its content is cd
followed by the contents of the "Z" group (and the entire "Z" group is optional as it has the ?
soon after)
In this case, the groups are "numbered" in the order they appear: the group they have cd etc...
is the first, and the appointed group is the second.
Finally, the indexes returned in the example of polyfill sane:
[ 1, 8 ]
: where the match whole, because 1
is the position where the a
initiating the regex, and 8
is a position after where the match ends - in this case, that’s where the f
[ 4, 8 ]
: the unnamed group starts at index 4 - that’s where the c
in the string, that’s where the sub-match concerning this group
[ 6, 8 ]
: 6
is the index on which the e
, that’s where the sub-match concerning the group.
And when there is a named group, its indices are also placed on indices.groups
, in the form of an object, in which the keys are the names of the groups and the values are the respective indexes.
How the nominated group is optional (indicated by ?
), if the string were zabbcd123
, the last index group ([6, 8]
) would not be returned (in its place, is placed undefined
).
According to the proposal, the property indices
would only be returned if the regex has the flag d
. That is, the regex would be created as /ab*(cd(?<Z>ef)?)/d
or new RegExp('ab*(cd(?<Z>ef)?)', 'd')
.
Recently (May/2021) MDN updated the documentation, and there is already flag d
: see here and here. It is also interesting to note that every instance of RegExp
will get the property hasIndices
, indicating whether the flag d
was used (true
or false
). But be sure to consult the compatibility table before leaving using, because it is not yet all browsers that support.
Therefore, the code below may or may not work in your browser (I tested in Chrome 90 and was):
// maio/2021 - só funciona em alguns browsers (testado no Chrome 90)
var r = /ab*(cd(?<Z>ef)?)/d; // regex com a flag d
console.log('tem a flag:', r.hasIndices); // true
var result = 'zabbcdef'.match(r);
console.log('índices:', result.indices);
console.log('índices dos grupos:', result.indices.groups);
Just to try to clarify a little more, follow another example:
const execWithIndices = require("regexp-match-indices");
const text = "- abc 123 xy 4567 .";
const result = execWithIndices(/([a-z]+) (?<nums>\d+) ([a-z]+) (?<othernums>\d+)/, text);
console.log(result.indices);
regex searches for strings of letters ([a-z]+
) and numbers (\d+
), the numbers are in named groups, and the letters are in "normal" groups (no name).
To be more precise, regex searches for letters, space, numbers, space, letters, space and numbers. There are four capture groups: the first and third search for the letters, and the second and fourth search for the numbers (and these have the names "nums" and "othernums").
In this case, the value of the property indices
is the array:
[
[ 2, 17 ],
[ 2, 5 ],
[ 6, 9 ],
[ 10, 12 ],
[ 13, 17 ],
groups: { nums: [ 6, 9 ], othernums: [ 13, 17 ] }
]
In this case, the array elements are:
[2, 17]
: the indices corresponding to the whole match found (i.e., corresponds to the entire excerpt "abc 123 xy 4567")
[2, 5]
: the indexes that correspond to the first capture group (the first occurrence of "one or more letters" - the excerpt "abc")
[6, 9]
: the indices corresponding to the second capture group (the first occurrence of "one or more digits" - the "123")
[10, 12]
: the indices corresponding to the third capture group (the second occurrence of "one or more letters" - the "xy")
[13, 17]
: the indices corresponding to the fourth capture group (the second occurrence of "one or more digits" - the entry "4567")
- the property
groups
, which is an object containing the indices corresponding to the named groups (the name of each group being a key, and the value is the respective array containing the indices)
Now, if the second occurrence of letters and numbers is optional:
const execWithIndices = require("regexp-match-indices");
const text = "- abc 123.";
const result = execWithIndices(/([a-z]+) (?<nums>\d+)(?: ([a-z]+) (?<othernums>\d+))?/, text);
console.log(result.indices);
The result will be:
[
[ 2, 9 ],
[ 2, 5 ],
[ 6, 9 ],
undefined,
undefined,
groups: { nums: [ 6, 9 ], othernums: undefined }
]
That is, it was returned undefined
in the positions corresponding to the groups that are present in the regex, but because they are in a part that is optional, they were not eventually filled.
And of course, if the regex has no capture group, only the indexes referring to the match found. That is, in the case below:
const execWithIndices = require("regexp-match-indices");
const text = "- abc 123 xy 4567 .";
const result = execWithIndices(/[a-z]+ \d+/, text);
console.log(result.indices);
The result will be:
[ [ 2, 9 ], groups: undefined ]
That is, the indexes [2, 9]
indicate where the match (that in the case are letters, space and numbers), and as there are no groups, there are no more elements (and as there are no named groups, the property groups
is undefined
).
Remember that in the npm package the property indices
is by default Lazy, and is populated only if requested (ie if you only use result
in the above examples, the result.indices
will not be populated, only when you access directly result.indices
is that it has the array with the indexes). This behavior can be changed to be equal to the specification, in which there is no behavior Lazy, See the difference:
const execWithIndices = require("regexp-match-indices");
const text = "- abc 123 xy 4567 .";
let result = execWithIndices(/[a-z]+ \d+/, text);
console.log(result); // mostra "indices: [Getter/Setter]"
// desativar o modo "lazy", deixar igual ao do especificação
require("regexp-match-indices/config").mode = "spec-compliant";
result = execWithIndices(/[a-z]+ \d+/, text);
console.log(result); // mostra "indices: [ [2, 9], groups: undefined ]"
Yes,
(?<Z>z)
is a named catch group. Although the syntax may seem "difficult", the<Z>
only indicates the name of the group, which in this case isZ
. :)– Luiz Felipe