How to remove tags in Python text?

Asked

Viewed 1,038 times

9

In PHP we have a function called strip_tagsthat remove HTML tags from a given text.

Example:

$text = "meu nome é <strong>Wallace</strong>";

strip_tags($text); // 'meu nome é Wallace'

How can I remove tags from a Python text?

2 answers

11

There are several ways, but I don’t think there’s any better way to fulfill this role than Beautifulsoup:

>>> from bs4 import BeautifulSoup as bs
>>> bs('<p>hey<span> brrh </span>lolol', 'html.parser').text
'hey brrh lolol'

Note: To install in Python 3.5 use pip:

pip install --upgrade beautifulsoup4

In-depth reading about Beautifulsoup

  • Great, it worked on Python 2.7 and Python 3.*. +1

  • Yap, it’s one of the most used modules for what I do @Wallacemaxters

  • 1

    I updated your answer to instruct who doesn’t have the module installed and left a +1 for you =)

  • 1

    @Guilhermenascimento Thank you for adding to the answer, it really is important

11


An example with would be so:

import re

text = 'meu nome é <strong>Wallace</strong>'
text = re.sub('<[^>]+?>', '', text)
print(text)

The function re.sub() takes as first parameter a regular expression and searches in the content, defined by the third parameter, snippets that combine with the expression, replacing them with the content defined in the second parameter.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.