How does "positive" (?=X) work combined with "positive" lookbehind (?<=X)?

Asked

Viewed 413 times

16

After answering this question, and despite can understand what is happening in the regular expression of my reply, I was curious to know how the excerpt ((?<=;)|(?=;)) works.

I read in this reply what each one does, and also in various other sources, but I confess that I did not understand well how the expression cited works.

If possible, I’d like an explanation, because I’m trying to learn regular expression but the explanations out there are complicated and with terms that I’m often not familiar with.

1 answer

17


You must have heard that phrase that says :

Just because you’ve reached your goal doesn’t mean you’re right

Well, what happened with your REGEX is the following, I will explain with another to be clearer.

/((?<=t)|(?=a)).+/

Explanation

  • ((?<=t)|(?=a)) - Group in which one of the occurrences must occur, giving preference to the first (?<=t)
  • .+ - Anything you can get and having at least one.

So we could dismember into two REGEX :

  • /(?<=t).+/ - Anything that comes after t
  • /(?=a).+/ - Anything you got a

Testing

$regex = '~((?<=t)|(?=a)).+~';

$testes = array(
    'ana',
    'tania',
    'anastasia',
    'etilico',
    'aguilherme'
    'guilherme'
);

foreach ($testes as $k => $value){
    preg_match($regex, $value, $matches);
    print_r($matches);
}

Exit

[0] => ana
[0] => ania
[0] => anastasia
[0] => ilico
[0] => aguilherme
[0] => 

Taking up your REGEX ((?<=;)|(?=;))

It’s redundant because they’re both checking ; so if there is of the sentence (?=;) then it will also occur (?<=;).
Yet there is one thing, (?<=...) depends on what comes next, in such a way that if it is .+, but the comma come at the end of the sentence ;$, the second part will not be completed .+, so falling in the second (?=;).

Doubts

This last explanation can leave a little confused, any doubt ask.

Addendum - as to your doubt

The problem I had was that when I used Lookahead, the comma was not isolated and broken into an inlet by the java split when it was preceded by another string, and the inverse occurred when it was the lookbehind see so the confusion to understand each.

What happens is the following :

Both REGEX are "exploding" by merging characters (this example group 2), but that (?<=;) is more specific than (?=;), reminding above:

  • (?<=;) - Whatever comes after ;
  • (?=;) - sentence that contains ;

Thus (?<=;) will explode by the junction that comes after the ; forming the words you saw pontoevirgula;, delinha;.

But the (?=;) could explode with both what comes before and what will come after, however the split consumes the character after using it, this way only occurs the explosion by the first (that comes before) generating the other words you saw: ;espaco, ;QUEBRA

  • The problem I had was with the fact that when I wore the Lookahead, the comma was not isolated and broken into an inlet by the java split when it was preceded by another string, and the reverse occurred when it was the lookbehind see: https://ideone.com/yqWyjj so the confusion to understand each one.

  • @Articuno I’m kind of busy right now, Aja explain this effect, but beforehand, it has to do with the consumption of ;

  • @Edited Articuno, see if there are doubts

  • I understood yes, without abusing his good will, then because the two combined isolate the ;?

  • 4

    Since he is not here, I will answer for him.. Well, actually his expression will not isolate the ;, what is separating him from the other terms is itself String#split();. In case you didn’t understand, see this photo. Your regex is finding two values at different positions (nulls), in which case the Java split will only divide them. See this example (I hope I didn’t get too confused).

  • @Matthew exactly that, thank you :D

  • @Matthew this should be another answer, complementing that of William saw, I understood by his explanation added to his answer.

Show 2 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.