As others have said, the problem is the stretch (?<!,)
, which is a Negative lookbehind. In this case, it checks whether nay there is a comma before the desired character (which in this case is also a comma). If so, the regex fails.
And right away we have (?!,)
, which is a Negative Lookahead, that checks for a comma after. So /(?<!,),(?!,)/
serves to capture commas that do not have another comma before or after, which is another way of saying that regex does not take the cases where there are two or more commas in a row (example).
How are you using this regex in a split
, means that the string will be separated only at the positions where there is a comma (as long as it does not have a comma before or after). That is, if you have two or more commas in a row, they are not considered in split
.
Note: at the time the question was asked, this syntax was not available at all browsers, like Firefox (cited in the question). But seeing this link today - July/2021 - we can see that several other browsers, such as Firefox and Edge, now have support (but anyway, it is not yet implemented at all, so the alternative below remains an option).
I ran your code in Chrome (code below):
String.prototype.scapeSplit = function (v) {
let r_split = new RegExp('(?<!' + v + ')' + v + '(?!' + v + ')');
let r_replace = new RegExp(v + '{2}');
let s = this.split(r_split);
// split produz a lista ["ab", "cd,,ef", "gh,,,ij", "kl"]
return s.map(function (x) {
return x.replace(r_replace, v);
});
}
let s = 'ab,cd,,ef,gh,,,ij,kl';
// ["ab", "cd,ef", "gh,,ij", "kl"]
console.log(s.scapeSplit(','));
How Chrome already supports lookbehinds, the code ran smoothly. I saw that your code first does the split
. Using the string 'ab,cd,,ef,gh,,,ij,kl'
and doing the split
comma, the first regex breaks the string only where there are no two or more commas in a row.
So the result is the list ["ab", "cd,,ef", "gh,,,ij", "kl"]
. Then is made a map
in this list, replacing two commas in a row (v + '{2}'
, which results in ,{2}
- two commas in a row) for only one. That is, cd,,ef
is transformed into cd,ef
and gh,,,ij
, in gh,,ij
.
The final result is the list ["ab", "cd,ef", "gh,,ij", "kl"]
.
Alternative to browsers that do not support lookbehind
Since this feature is not supported in all browsers, the approach should be a little different. Instead of split
, I’ll use the method match
, and in the regex I will use the flag g
, which causes an array with all the pouch found.
But I will use a different regex, since the logic will be reversed. While in split
I put a regex with the things I nay want in the final result (comma that has no other comma before or after), in the match
I do the opposite: I put the things I want to be in the end result (deep down, split
and match
are only two sides of the same coin). Anyway, what I want to be in the final result is:
- a string other than a comma
- optionally followed by a sequence of two or more commas
- this whole sequence can be repeated several times (for example, if you have a snippet
aa,,bb,,,cc,,,dd
, all this is a single element that the split
did not separate, so the match
must have a regex that considers all this one thing).
In case, I’ll use ([^,]+(,{2,})?)+
. Explaining from the inside out:
[^,]+
: The delimiter [^
represents a character class denied, that is, the regex considers any character other than the one between [^
and ]
. In this case, it only has the comma. And the quantifier +
means "one or more occurrences". That is, it is a sequence of several characters that are not commas.
(,{2,})?
: the stretch ,{2,}
means "two or more commas", and the ?
makes all this excerpt optional. This means you can have a string of multiple commas, or not.
- The
+
around the whole expression (grouped in parentheses) says that this can be repeated several times. That is, the whole set "multiple characters that are not commas, followed or not by multiple commas" can be repeated several times.
This ensures that snippets like ab
, ab,,cd
and ab,,cd,,,ef
will be considered only one thing. Example:
let matches = 'ab,cd,,ef,gh,,,ij,kl'.match(/([^,]+(,{2,})?)+/g);
console.log(matches); // ["ab", "cd,,ef", "gh,,,ij", "kl"]
The result was the array ["ab", "cd,,ef", "gh,,,ij", "kl"]
, exactly the same as your original code gets before the map
. I mean, now just do the map
and your code is ready:
String.prototype.scapeSplit = function (v) {
let r_match = new RegExp('([^' + v + ']+(' + v + '{2,})?)+', 'g');
let r_replace = new RegExp(v + '{2}');
let s = this.match(r_match);
// match produz a lista ["ab", "cd,,ef", "gh,,,ij", "kl"]
return s.map(function (x) {
return x.replace(r_replace, v);
});
}
let s = 'ab,cd,,ef,gh,,,ij,kl';
// ["ab", "cd,ef", "gh,,ij", "kl"]
console.log(s.scapeSplit(','));
The result will be the array ["ab", "cd,ef", "gh,,ij", "kl"]
.
The above solution works well when the parameter passed to scapeSplit
has only one character.
If the parameter has more than one character, there are some modifications to be made.
If the browser supports Negative lookbehind (as is the case with Chrome), just fix the regex that does the replace
for:
let r_replace = new RegExp('(' + v + '){2}');
Case v
for example the string 12
: if it has no parentheses, the result is 12{2}
(the number 1
, followed by two numbers 2
). But I really want to (12){2}
(two occurrences of 12
). Fixing this, you can use the string '12'
in the split
that will work smoothly, following the same comma logic (only separate by 12
if there is no other occurrence of 12
before or after).
If the browser does not support Negative lookbehind, we can’t use [^...]
as was done above, so the solution is a little more complicated¹:
String.prototype.scapeSplit = function (v) {
let r_match = new RegExp('(?:' + v + ')(?!(' + v + ')+)', 'g');
let lookbehind = new RegExp(v + '$'); // simula o lookbehind
let indices = [], match;
// primeiro obtém os índices em que a expressão ocorre
while (match = r_match.exec(this)) {
if (match.index == r_match.lastIndex) r_match.lastIndex++;
// obtém a substring de zero até o índice em que o match ocorre
let leftContext = match.input.substring(0, match.index);
if (! lookbehind.exec(leftContext)) { // simular lookbehind negativo
indices.push({ start: match.index, end: match.index + match[0].length });
}
}
// agora faz o split pelas posições encontradas acima
let pos = 0;
let result = [];
indices.forEach(i => {
result.push(this.substring(pos, i.start));
pos = i.end;
});
// não esquecer do último
result.push(this.substring(pos));
let r_replace = new RegExp('(' + v + '){2}');
// o indices.forEach acima produz a lista result = ["ab", "cd1212ef", "gh121212ij", "kl"]
return result.map(function (x) {
return x.replace(r_replace, v);
});
}
let s = 'ab12cd1212ef12gh121212ij12kl';
// ["ab", "cd12ef", "gh1212ij", "kl"]
console.log(s.scapeSplit('12'));
If the parameter is, for example, the string '12'
, the first regex (r_match
) stays (?:12)(?!(12)+)
. That is, the string 12
, provided that it is not followed by one or more occurrences of 12
.
Then I make a while
traversing all the pouch of this regex in the string. Each time I find one, I use another regex to simulate the lookbehind. I do this by getting a substring that corresponds to the original string, from the beginning to the point where the match was found (match.index
). If this chunk ends with the given string, it means that the lookbehind found a repetition of the string (but as I want a Negative lookbehind, i do if (!lookbehind.exec(leftContext))
).
For example, if the input string starts with ab12cd
, the match is found at position 2 (where the 12
). So I make one substring
up to position 2 (resulting in ab
) and check that this string ends in 12
(I mean, I’m simulating what the lookbehind would do).
So I save the match.index
(position in which the match occurred) and match.index + match[0].length
(position where it ends = initial position of the match plus the size of the string found). At the end of this while
, I have all positions in which the pouch occurred. With this I know exactly where I have to do the split
.
Then I make a forEach
by these indices, using substring
to pick up the given chunk and add these substrings into an array. Ultimately I just simulated what the split
would do if the lookbehind were supported.
Finally, I do the replace
to eliminate repetitions, as done with the comma (remember to put the parentheses).
PS: the excerpt if (match.index == r_match.lastIndex) r_match.lastIndex++;
is done to fix a bug for cases of zero width Matches (explained in this link). It does not occur for the specific strings and regex we are using, but in any case it gets the record.
(1) - This solution simulating lookbehind was based in this book.
Hiago, my original answer only worked when the
v
has only one character. I edited the answer and added a more general solution, which works with strings of any size– hkotsubo