Remove tags generated at the end of a string from a Text Editor

Asked

Viewed 63 times

2

I am using a text editor and like others I have used, it always generates some useless tags that I would like to remove. I can remove the last, but sometimes it generates more than once.

My code:

def remove_useless_tags(message):
    message = message.replace("<p><br/></p>", "") \
                .replace("<p></p>", "") \
                .replace("<p><b><br/></b></p>", "")
    # .replace("<p><br></p>", "")
    if message[-11:] == "<p><br></p>":
        message = message[:-11]
    return message

When a string appears this way: <p>Olá</p><p><br></p> he can remove the <p><br></p> of the end. But sometimes texts come in this format:

<p>Olá</p><p><br></p><p><br></p>
<p>Olá</p><p><br></p><p><br></p><p><br></p>

I’d like to remove all <p><br></p> end of string. Remembering that there are some <p><br></p> that are in the middle of the sentence that cannot be removed. They are "enters" that the user even puts when he will write. The problem is the final "enters", which are unnecessary but compromise the layout.

I believe that can be solved with regex, but I need a help with this. Thank you!

1 answer

5


If you need to get exactly one sequence at the end of the text you can use the token $, your regex only needs a limiter and a quantifier in the sequence you want to capture, so you don’t have to keep repeating the replacement command.

So I recommend you use this Regex (<p><br><\/p>)*?$ with the function re.sub(Pattern, substitution, string), since replace does not work with regex..

Application in your code:

import re
[...]
def remove_useless_tags(message):
    result = re.sub('(<p><br><\/p>)*?$', "", message)

    return result

Explanation by Regex

(<p><br><\/p>)*?$
  • (<p><br><\/p>) > Sequence you want to capture.

  • *? > Quantifier Lazy, will capture 0 or + sequences.

  • $ > Signals that you can only capture at the end of the string.

Here is also a Regex test

  • If I wanted to also remove these contents from the beginning of the string I would put in the beginning the $? Thus: $(<p><br><\/p>)*? ?

  • 1

    @Guilhermeia The logic is this, only that the character that signals the beginning of string in regex is ^, then I would be ^(<p><br><\/p>)*?. If you want to use the regex of the answer and this in the same pattern to take the repetition of both the beginning and the end would look like this: ^(<p><br><\/p>)*|(<p><br><\/p>)*?$

Browser other questions tagged

You are not signed in. Login or sign up in order to post.