Negative lookbehind only works on Google Chrome, is there an alternative to other browsers?

Asked

Viewed 259 times

8

The regex /(?<!,),(?!,)/ has instability in some versions of browsers.

I found this regex in the system of the company I work, and I didn’t understand very well the purpose of it. Apparently the problem is in the <, Edge does not work, nor does Mozilla. What’s the problem with the expression?

Test in regex101.com.

Code that was in the prototype of String:

String.prototype.scapeSplit = function (v) {
        var r_split = new RegExp('(?<!' + v + ')' + v + '(?!' + v + ')');
        var r_replace = new RegExp(v + '{2}');

        var s = this.split(r_split);
        return s.map(function (x) {
            return x.replace(r_replace, v);
        });
    }
  • Hiago, my original answer only worked when the v has only one character. I edited the answer and added a more general solution, which works with strings of any size

3 answers

13


As others have said, the problem is the stretch (?<!,), which is a Negative lookbehind. In this case, it checks whether nay there is a comma before the desired character (which in this case is also a comma). If so, the regex fails.

And right away we have (?!,), which is a Negative Lookahead, that checks for a comma after. So /(?<!,),(?!,)/ serves to capture commas that do not have another comma before or after, which is another way of saying that regex does not take the cases where there are two or more commas in a row (example).

How are you using this regex in a split, means that the string will be separated only at the positions where there is a comma (as long as it does not have a comma before or after). That is, if you have two or more commas in a row, they are not considered in split.

Note: at the time the question was asked, this syntax was not available at all browsers, like Firefox (cited in the question). But seeing this link today - July/2021 - we can see that several other browsers, such as Firefox and Edge, now have support (but anyway, it is not yet implemented at all, so the alternative below remains an option).

I ran your code in Chrome (code below):

String.prototype.scapeSplit = function (v) {
  let r_split = new RegExp('(?<!' + v + ')' + v + '(?!' + v + ')');
  let r_replace = new RegExp(v + '{2}');

  let s = this.split(r_split);
  // split produz a lista ["ab", "cd,,ef", "gh,,,ij", "kl"]
  return s.map(function (x) {
      return x.replace(r_replace, v);
  });
}

let s = 'ab,cd,,ef,gh,,,ij,kl';

// ["ab", "cd,ef", "gh,,ij", "kl"]
console.log(s.scapeSplit(','));

How Chrome already supports lookbehinds, the code ran smoothly. I saw that your code first does the split. Using the string 'ab,cd,,ef,gh,,,ij,kl' and doing the split comma, the first regex breaks the string only where there are no two or more commas in a row.

So the result is the list ["ab", "cd,,ef", "gh,,,ij", "kl"]. Then is made a map in this list, replacing two commas in a row (v + '{2}', which results in ,{2}- two commas in a row) for only one. That is, cd,,ef is transformed into cd,ef and gh,,,ij, in gh,,ij.

The final result is the list ["ab", "cd,ef", "gh,,ij", "kl"].


Alternative to browsers that do not support lookbehind

Since this feature is not supported in all browsers, the approach should be a little different. Instead of split, I’ll use the method match, and in the regex I will use the flag g, which causes an array with all the pouch found.

But I will use a different regex, since the logic will be reversed. While in split I put a regex with the things I nay want in the final result (comma that has no other comma before or after), in the match I do the opposite: I put the things I want to be in the end result (deep down, split and match are only two sides of the same coin). Anyway, what I want to be in the final result is:

  • a string other than a comma
  • optionally followed by a sequence of two or more commas
  • this whole sequence can be repeated several times (for example, if you have a snippet aa,,bb,,,cc,,,dd, all this is a single element that the split did not separate, so the match must have a regex that considers all this one thing).

In case, I’ll use ([^,]+(,{2,})?)+. Explaining from the inside out:

  • [^,]+: The delimiter [^ represents a character class denied, that is, the regex considers any character other than the one between [^ and ]. In this case, it only has the comma. And the quantifier + means "one or more occurrences". That is, it is a sequence of several characters that are not commas.
  • (,{2,})?: the stretch ,{2,} means "two or more commas", and the ? makes all this excerpt optional. This means you can have a string of multiple commas, or not.
  • The + around the whole expression (grouped in parentheses) says that this can be repeated several times. That is, the whole set "multiple characters that are not commas, followed or not by multiple commas" can be repeated several times.

This ensures that snippets like ab, ab,,cd and ab,,cd,,,ef will be considered only one thing. Example:

let matches = 'ab,cd,,ef,gh,,,ij,kl'.match(/([^,]+(,{2,})?)+/g);
console.log(matches); // ["ab", "cd,,ef", "gh,,,ij", "kl"]
 

The result was the array ["ab", "cd,,ef", "gh,,,ij", "kl"], exactly the same as your original code gets before the map. I mean, now just do the map and your code is ready:

String.prototype.scapeSplit = function (v) {
  let r_match = new RegExp('([^' + v  + ']+(' + v + '{2,})?)+', 'g');
  let r_replace = new RegExp(v + '{2}');

  let s = this.match(r_match);
  // match produz a lista ["ab", "cd,,ef", "gh,,,ij", "kl"]
  return s.map(function (x) {
      return x.replace(r_replace, v);
  });
}

let s = 'ab,cd,,ef,gh,,,ij,kl';

// ["ab", "cd,ef", "gh,,ij", "kl"]
console.log(s.scapeSplit(','));

The result will be the array ["ab", "cd,ef", "gh,,ij", "kl"].


Expressions with more than one character

The above solution works well when the parameter passed to scapeSplit has only one character.

If the parameter has more than one character, there are some modifications to be made.

If the browser supports Negative lookbehind (as is the case with Chrome), just fix the regex that does the replace for:

let r_replace = new RegExp('(' + v + '){2}');

Case v for example the string 12: if it has no parentheses, the result is 12{2} (the number 1, followed by two numbers 2). But I really want to (12){2} (two occurrences of 12). Fixing this, you can use the string '12' in the split that will work smoothly, following the same comma logic (only separate by 12 if there is no other occurrence of 12 before or after).


If the browser does not support Negative lookbehind, we can’t use [^...] as was done above, so the solution is a little more complicated¹:

String.prototype.scapeSplit = function (v) {
  let r_match = new RegExp('(?:' + v + ')(?!(' + v + ')+)', 'g');
  let lookbehind = new RegExp(v + '$'); // simula o lookbehind
  let indices = [], match;
  // primeiro obtém os índices em que a expressão ocorre
  while (match = r_match.exec(this)) {
      if (match.index == r_match.lastIndex) r_match.lastIndex++;
      // obtém a substring de zero até o índice em que o match ocorre
      let leftContext = match.input.substring(0, match.index);
      if (! lookbehind.exec(leftContext)) { // simular lookbehind negativo
          indices.push({ start: match.index, end: match.index + match[0].length });
      }
  }
  // agora faz o split pelas posições encontradas acima
  let pos = 0;
  let result = [];
  indices.forEach(i => {
      result.push(this.substring(pos, i.start));
      pos = i.end;
  });
  // não esquecer do último
  result.push(this.substring(pos));

  let r_replace = new RegExp('(' + v + '){2}');
  // o indices.forEach acima produz a lista result = ["ab", "cd1212ef", "gh121212ij", "kl"]
  return result.map(function (x) {
      return x.replace(r_replace, v);
  });
}

let s = 'ab12cd1212ef12gh121212ij12kl';

// ["ab", "cd12ef", "gh1212ij", "kl"]
console.log(s.scapeSplit('12'));

If the parameter is, for example, the string '12', the first regex (r_match) stays (?:12)(?!(12)+). That is, the string 12, provided that it is not followed by one or more occurrences of 12.

Then I make a while traversing all the pouch of this regex in the string. Each time I find one, I use another regex to simulate the lookbehind. I do this by getting a substring that corresponds to the original string, from the beginning to the point where the match was found (match.index). If this chunk ends with the given string, it means that the lookbehind found a repetition of the string (but as I want a Negative lookbehind, i do if (!lookbehind.exec(leftContext))).

For example, if the input string starts with ab12cd, the match is found at position 2 (where the 12). So I make one substring up to position 2 (resulting in ab) and check that this string ends in 12 (I mean, I’m simulating what the lookbehind would do).

So I save the match.index (position in which the match occurred) and match.index + match[0].length (position where it ends = initial position of the match plus the size of the string found). At the end of this while, I have all positions in which the pouch occurred. With this I know exactly where I have to do the split.

Then I make a forEach by these indices, using substring to pick up the given chunk and add these substrings into an array. Ultimately I just simulated what the split would do if the lookbehind were supported.

Finally, I do the replace to eliminate repetitions, as done with the comma (remember to put the parentheses).

PS: the excerpt if (match.index == r_match.lastIndex) r_match.lastIndex++; is done to fix a bug for cases of zero width Matches (explained in this link). It does not occur for the specific strings and regex we are using, but in any case it gets the record.


(1) - This solution simulating lookbehind was based in this book.

  • 1

    Perfect @hkotsubo helped me a lot!! Thank you.

7

/(?<!,),(?!,)/
  1. (?!) - Negative Lookahead
  2. (?<!) - Negative lookbehind

What fails in your code is this part: (?<!), Lookbehinds are only available in browsers that support the standard ECMA2018, and that means only the latest versions of Chrome can handle them. More: http://kangax.github.io/compat-table/es2016plus/

Your regex is as follows:

Find , where , does not precede and , does not follow, ie, find commas where there are no commas or before and after. See that in your regex101 where you have ,,, is not returned because there are commas before and after!

An alternative is to exchange the lookbehind by a code that is compatible with other browsers

Sources I used as reference:

  1. Regex Lookahead, lookbehind and Atomic groups
  2. Javascript regex Negative lookbehind not Working in firefox
  3. Remove a part of the URL
  • Thank you Marconi, how I would do it in ECMA2015?

  • Just a minute @Hiagosouza I’m figuring a way!

  • Thank you very much @Marconi.

  • @Hiagosouza maybe hkotsubo’s answer can help you!

  • 1

    Your reply was of great help, but the answer below is more complete and it is unfair that I mark yours as correct, I hope you do not get upset.

  • Exact hkotsubo response is exactly what I need

  • 1

    @Hiagosouza I fully agree with you, I found that your problem was simple and ended up cheated and without time, I’ll even remove my suggested code. Even edit your question for more visibility.

Show 2 more comments

3

Apparently the problem is in < no Edge does not work, nor in Mozila. What is the problem with the expression?

The problem with the expression is that it uses the Negative lookbehind (?<!).

That guy assertion not supported by all browsers, only those using the default ECMA2018.

Your options are:
Make the first regex scan and after that invert the text and use again the Positive Lookahead already available in the version of ECMA2015, do not recommend doing this with large data streams as performance will be affected or wait for the update and not provide this Feature as identified by the browser used.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.