Regular expression to remove images containing a height range in inline style

Asked

Viewed 34 times

0

How could I improve this regular expression to remove images that range from 29 to 45: [0-9]{29, 45}?

my expression:

<img.+?(style=\".+?height:(29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45)%;.+?\")[^>]*>

When I tried to do so he removed the image style and not the image itself:

 <img.+?(style=\".+?height:([0-9]{29, 45})%;.+?\")[^>]*>

Look at my example:

https://regexr.com/3h5b5

  • Put a raw inline, to see.....

1 answer

2


As its interval is 29 to 45 or is it relatively regular, the logic would be as follows :

  • Separate decimals and units
  • Verify which is the Rang of each of them
  • Decimal : [2-4], unity : [0-9] => \d

However the exception being 29 and 45, so :

  • Decimal 2, unit 9.
  • Decimal 4 has unit of [0-5]

Then the rule would be :

29|3\d|4[0-5]

AND REGEX COMPLETE :

<img.+?(style=\".+?height:(29|3\d|4[0-5])%;.+?\")[^>]*>

As to the {29, 45}

This is actually a repeat interval and no number allowed, for more details you can see Quantifiers

  • Thanks, perfect.

  • 3\d is the same as 3[0-9] ?

  • @Articuno exactly.

  • @Magichat Not this in the question, but can be put

  • @Magichat but by the original REGEX, may want to capture only those that are in %

  • @Magichat Exactly :D

  • 1

    The .+? could match a valid image to the next invalid image. Example: https://regex101.com/r/BiB8Xn/1. You can use [^"]*? or GIFT.

  • @Mariano I took the tests and in fact the ideal would be to exchange all the .+? for [^>]+? for then you would be limiting the beginning and end correctly <img up to >

  • https://regex101.com/r/BiB8Xn/2

  • 1

    @Guilhermelautert This is another option, or [^>]*? outside the style and [^"]*? within the style. Remember that a > can appear between quotes in a style. But I would recommend repeating with * instead of +. Anyway, DOM is the only safe option here. There is always another exception that would break regex when parsing html.

  • 1

    @Exact Mariano, if it gets complex d+ the ideal is to use a parser

Show 6 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.