How to select a text that does not have a certain term in the middle?

Asked

Viewed 206 times

1

I am trying to select a part of an HTML code with Regex but I am not able to do the correct regular expression, someone could give a help?

I need to select the groups of <li> separately, i.e., without the presence of the tag <br> in the middle.

For example, I’m trying with the expression below:

/<li.*(?!<br).*\/li>/gi

And I need to select the following text separately:

<li>Teste 1</li><li>Teste 2</li><li>Teste 3</li>

In this test, i created two occurrences from that list, but the expression is selecting everything from the first occurrence to the last.

How do I select the two lists separately?

1 answer

3


The problem of quantifiers * and + is that they are "greedy", that is, they try to take as many characters as possible that satisfies the expression.

To cancel this "greedy" behavior is enough put a ? after the *. With this, the expression will take as few characters as necessary (so *? is also called Lazy quantifier). Then the regex would look like this:

/<li.*?(?!<br).*?\/li>/

You can see it running here.


The above regex takes 6 groups (each tag li) separately. To take a sequence of several li that does not contain br as if they were one thing, just search for 1 or more occurrences of all the previous regex (using the quantifier +):

(<li.*?(?!<br).*?\/li>)+

You can see this regex working here.

  • But there’s another detail, I wanted you to take 2 groups and not each item on the list (which is 6), there’s no way?

  • Now I’m not on the computer, but I’ll see as soon as I can. Anyway, I think an HTML parser might be better than regex

  • @Rogerwolff I edited the answer, adding this other case (2 groups instead of 6)

  • 1

    But I still think it’s better to use an HTML/XML parser. Regex is nice, but is not the best tool to make HTML Parsing.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.