Remove comment tag and your content in Beautifulsoup 4

Asked

Viewed 197 times

0

How do I remove the comment tag along with its content with bs4 ?

<div class="foo">
A Arara é um animal voador.
<!-- 
<p>Animais
Nome: Arara
Idade: 12 anos e 9 meses
Tempo de Vida: 15 anos
-->

</div>

4 answers

2

Based on the answers to the question Beautifulsoup 4: Remove comment tag and its content, you can use the method extract to remove an item from the tree. To know if the item is a comment, simply check if it is an instance of bs4.Comment.

from bs4 import BeautifulSoup, Comment

html = """<div class="foo">
A Arara é um animal voador.
<!-- 
<p>Animais
Nome: Arara
Idade: 12 anos e nove meses
Tempo de Vida: 15 anos
-->

</div>"""

soup = BeautifulSoup(html, 'html.parser')

div = soup.find('div', class_='foo')
for element in div(text=lambda it: isinstance(it, Comment)):
    element.extract()

print(soup.prettify())

The exit will be:

<div class="foo">
 A Arara é um animal voador.
</div>

2


I found a simplified solution based on the answer to the question How to find all comments with Beautiful Soup

First import the Beautifulsoup with the necessary methods.

from bs4 import BeautifulSoup, Comment

Second, use the code below to extract comments

for comments in soup.findAll(text=lambda text:isinstance(text, Comment)):
    comments.extract()

1

If you only want the content of div foo:

div = soup.find('div', class_='foo')
print div.text

Upshot

The Macaw is a flying animal.

1

This solution uses List Comprehension.

from bs4 import BeautifulSoup, Comment
soup = BeautifulSoup(html, 'html.parser')
[x.parent.decompose() for x in soup.find_all(text=lambda x: isinstance(x, Comment))]

Browser other questions tagged

You are not signed in. Login or sign up in order to post.