Remove tags generated at the end of a string from a Text Editor

Question

Remove tags generated at the end of a string from a Text Editor

Asked 6 years, 8 months ago

Viewed 63 times

2

I am using a text editor and like others I have used, it always generates some useless tags that I would like to remove. I can remove the last, but sometimes it generates more than once.

My code:

def remove_useless_tags(message):
    message = message.replace("<p><br/></p>", "") \
                .replace("<p></p>", "") \
                .replace("<p><b><br/></b></p>", "")
    # .replace("<p><br></p>", "")
    if message[-11:] == "<p><br></p>":
        message = message[:-11]
    return message

When a string appears this way: Olá  he can remove the   of the end. But sometimes texts come in this format:

<p>Olá</p><p><br></p><p><br></p>
<p>Olá</p><p><br></p><p><br></p><p><br></p>

I’d like to remove all   end of string. Remembering that there are some   that are in the middle of the sentence that cannot be removed. They are "enters" that the user even puts when he will write. The problem is the final "enters", which are unnecessary but compromise the layout.

I believe that can be solved with regex, but I need a help with this. Thank you!

1 answer

Browser other questions tagged python regex django replace

You are not signed in. Login or sign up in order to post.

by Paz • **3,062** points · Answer 1 · 2018-11-10T00:09:47+00:00

If you need to get exactly one sequence at the end of the text you can use the token $, your regex only needs a limiter and a quantifier in the sequence you want to capture, so you don’t have to keep repeating the replacement command.

So I recommend you use this Regex ( <\/p>)*?$ with the function re.sub(Pattern, substitution, string), since replace does not work with regex..

Application in your code:

import re
[...]
def remove_useless_tags(message):
    result = re.sub('(<p><br><\/p>)*?$', "", message)

    return result

Explanation by Regex

(<p><br><\/p>)*?$

( <\/p>) > Sequence you want to capture.
*? > Quantifier Lazy, will capture 0 or + sequences.
$ > Signals that you can only capture at the end of the string.

Here is also a Regex test