Negative lookahead in php

Asked

Viewed 33 times

0

What’s wrong with that regular expression?

preg_match_all('/{{.*?(?!\|e)}}/',$content,$matches);

In the text below it should not take {{Expression|e}} since I am denying the |e in the group (?!\|e)

Edit the {{Expression|e}} & Text to see Matches. Roll over Matches or the Expression for Details. PCRE & Javascript {{Flavors}} of Regex are supported. {{Validate}} your Expression with Tests mode.

How it should be the right expression for Matches to be only {{Flavors}} and {{Validate}}

1 answer

2


First, about .*?: the point corresponds to any character, and the *? is a quantifier Lazy, which takes zero or more characters, but always the smallest possible amount that satisfies the expression.

Detailed operation is explained here, here and here, but basically it works like this:

Assuming the case of Expression|e, first regex tries a match with zero characters before the }. Since there’s more than one, she comes back and tries with just the "And".

As after the "E" has no "}", she tries with "Ex", and so on. When it arrives at the "n", the Lookahead see that afterwards there is |e and fails, only as the point corresponds to any character, she keeps trying.

So the next attempt is with "Expression|", she sees that then there is no "}", and finally she tries with "Expression|e", she sees that then there is no |e and has "}", and returns the match.

This is because the expression means "zero or more characters (.*?), since there is no "|e" after". The problem is that the point picks up any character, including the very | and also the e. It may seem contradictory and counterintuitive, but this is how the engine works.

If the idea is not to allow |e inside the brackets, then the Lookahead must be within the repetition:

preg_match_all('/{{([^}](?!\|e))+}}/',$content,$matches);

In the case, [^}] is "any character that nay be it }" and the Lookahead is soon after. That is, it is a single character, as long as it does not have |e.

And all this repeats over and over again: I put everything in parentheses and used the quantifier +, which indicates one or more occurrences. The * indicates "zero or more occurrences", ie also picks cases as {{}}. Already using + I force you to have at least one character between the brackets.

So the regex is now "any character that is not }, provided that |e after, all this repeated once or more times".

One detail is that the parentheses form a capture group, which end up "occupying space" in the array of pouch. If you don’t want this extra data (and only the match complete), just switch to non-sample groups, placing (?::

preg_match_all('/{{(?:[^}](?!\|e))+}}/',$content,$matches);
  • Show, thank you very much, I actually only added a } after and to prevent it from escaping ee or escape, ie by forcing the after and have } ('/{{(^})+}}/',$content,$Matches);

Browser other questions tagged

You are not signed in. Login or sign up in order to post.