0
How do I remove the comment tag along with its content with bs4 ?
<div class="foo">
A Arara é um animal voador.
<!--
<p>Animais
Nome: Arara
Idade: 12 anos e 9 meses
Tempo de Vida: 15 anos
-->
</div>
0
How do I remove the comment tag along with its content with bs4 ?
<div class="foo">
A Arara é um animal voador.
<!--
<p>Animais
Nome: Arara
Idade: 12 anos e 9 meses
Tempo de Vida: 15 anos
-->
</div>
2
Based on the answers to the question Beautifulsoup 4: Remove comment tag and its content, you can use the method extract
to remove an item from the tree. To know if the item is a comment, simply check if it is an instance of bs4.Comment
.
from bs4 import BeautifulSoup, Comment
html = """<div class="foo">
A Arara é um animal voador.
<!--
<p>Animais
Nome: Arara
Idade: 12 anos e nove meses
Tempo de Vida: 15 anos
-->
</div>"""
soup = BeautifulSoup(html, 'html.parser')
div = soup.find('div', class_='foo')
for element in div(text=lambda it: isinstance(it, Comment)):
element.extract()
print(soup.prettify())
The exit will be:
<div class="foo">
A Arara é um animal voador.
</div>
2
I found a simplified solution based on the answer to the question How to find all comments with Beautiful Soup
First import the Beautifulsoup with the necessary methods.
from bs4 import BeautifulSoup, Comment
Second, use the code below to extract comments
for comments in soup.findAll(text=lambda text:isinstance(text, Comment)):
comments.extract()
1
If you only want the content of div foo
:
div = soup.find('div', class_='foo')
print div.text
Upshot
The Macaw is a flying animal.
1
This solution uses List Comprehension.
from bs4 import BeautifulSoup, Comment
soup = BeautifulSoup(html, 'html.parser')
[x.parent.decompose() for x in soup.find_all(text=lambda x: isinstance(x, Comment))]
Browser other questions tagged python python-3.x beautifulsoup
You are not signed in. Login or sign up in order to post.