Negation operator returns value that should be discarded

Asked

Viewed 142 times

1

When using the negation operator, I want to get only the part of the text that does not contain the previously denied group.

Using the expression ( ?<br\/?> ?)(Unit.), I get the following outworking:

inserir a descrição da imagem aqui

When adding the negation operator ?!, he denies and ignores, bringing all the results:

inserir a descrição da imagem aqui

The expected result for the expression is only the first tag, where it has nothing previously the word Unit.

1 answer

3


Your logic is almost certain, I say almost, because it lacks a small interpretation.

In REGEX you should analyze that it can start/end wherever you want, unless you explicitly define how it should behave.

Analyzing what happens

Caption

  • ^ Beginning of the text to be interpreted
  • $ End of text to be interpreted

Parse 1

<td>Preço<br/>Unit.</td>
^
$

Note that in this hunt the interpreted text has only <, so REGEX doesn’t hit

Parse 2

<td>Preço<br/>Unit.</td>
^      $

Note that in this hunt the interpreted text is <td>Preç, so REGEX doesn’t hit

Parse 3

<td>Preço<br/>Unit.</td>
         ^        $

Note that in this hunt the interpreted text is <br/>Unit., if REGEX is the first
( ?<br\/?> ?)(Unit.), beats perfectly finding the result, but as is the second
(?! ?<br\/?> ?)(Unit.) Lookback inhibits the result.

Analyze 4

<td>Preço<br/>Unit.</td>
              ^   $

Note that in this hunt the interpreted text is Unit., if REGEX is the first
( ?<br\/?> ?)(Unit.), the result is not found as missing ?<br\/?> ?
at first, but as the 2nd (?! ?<br\/?> ?)(Unit.), beats perfectly,
because Lookback says it should not contain ?<br\/?> ? before (Unit.),
and having nothing is valid. Thus returning as valid result.

Possible solution

Using the flag m to consider each new line \n as a new text to be interpreted. You can change the REGEX to :

/^(?!.* ?<br\/?> ?Unit\..*)(.*Unit\..*)$/gm

See on REGEX101

Explanation

  • ^...$ - I am saying that the sentence to be analyzed is from beginning to end.
  • (?!.* ?<br\/?> ?Unit\..*) - I’m saying if he finds .* ?<br\/?> ?Unit\..* shall not capture.
  • (.*Unit\..*) - Content to be captured.

Addendum

  • The best way to think of the denial Lookback (in my view) is to imagine the exact sentence of what it should capture.
  • You used Unit. in what if you want to capture the . literal must escape it, otherwise the capture will accept UnitG, Unit#, Unit.
  • I got it, William. I was really imagining the operation of Lookback incorrectly.

  • @Marcelodeandrade, I had been worried that the explanation had not turned out well, but I’m glad you understood :D

Browser other questions tagged

You are not signed in. Login or sign up in order to post.