Replace HTML tag with text with regex

Asked

Viewed 192 times

2

I need to replace an HTML tag (with a specific attribute) with a text, follow my example below:

String stringTest = '<variable data-id="1" name="test2" style="background-color:red;">Variable test</variable> isto é texto <variable style="background-color:red;" name="test" data-id="2">Variable Test2</variable> outro texto';

String value = "myText";

String pattern = "<variable\\s.*?name=['\"]test['\"]?.*?[^>]*>(.*?)<\\/variable>";
String newString = stringTest.replaceAll(pattern , value);

The idea is to replace the tag variable containing the attribute name="test" by the text "myText".

It turns out that both tags are being replaced. In this case the tag with the name="test2" I don’t want to be replaced.

The result I get with the above code is:

"myText isto é texto myText outro texto"

1 answer

2


One solution is to change the regex to:

String stringTest = "<variable data-id=\"1\" name=\"test2\" style=\"background-color:red;\">Variable test</variable> isto é texto <variable style=\"background-color:red;\" name=\"test\" data-id=\"2\">Variable Test2</variable> outro texto";
String value = "myText";
String pattern = "<variable\\s[^>]*?name=['\"]test['\"].*?[^>]*>(.*?)<\\/variable>";
String newString = stringTest.replaceAll(pattern, value);
System.out.println(newString);

I removed the ? right after the second ['\"], because it made the quotes optional, and with that the point that comes soon after could end up getting "test2".

I also put [^>] shortly after variable\\s, because then I guarantee that the regex will not leave the tag - using the point, it would end up leaving the first tag and "invading" the second (see) - the .*? only guarantees that regex will take the minimum necessary to satisfy it, and the minimum necessary in this case was to go to the second tag. Already using [^>], I guarantee that the regex will at most until the next >, not at the risk of "hacking" other tags (see the difference).

With regex above, only the second tag is replaced, and the result is:

<variable data-id="1" name="test2" style="background-color:red;">Variable test</variable> isto é texto myText outro texto

If you want, you can use:

String pattern = "<variable\\s[^>]*?name=['\"]test['\"].*?[^>]*>[^<]*<\\/variable>";

Inside the tags I use [^<]* (zero or more characters that are not <). This is valid if the tag variable has no other tags inside it, so regex will pick up everything until you find one <.


Remember also that the two options above do not work if you have a tag variable inside of another (example). In this case, you’d better use some parser XML/HTML.

  • 1

    Top, thank you very much!!!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.