Catch tags within tags in Beautifulsoup

Asked

Viewed 1,229 times

2

I have the following situation:

<a href="https://g1.globo.com">Globo</a>
<h3 class="b">
  <a href="https://www.google.com">Google</a>
</h3>

Using Beautifulsoup, as I do to get only the href and the text of the 'a' tag inside the 'H3'?

2 answers

3

The easiest way is to search inside the element h3 the tag a:

from bs4 import BeautifulSoup

code = '''<a href="https://g1.globo.com">Globo</a>
<h3 class="b">
    <a href="https://www.google.com">
        Google
    </a>
</h3>'''

soup = BeautifulSoup(code)

tag_a = soup.h3.a  

print(tag_a.text)
print(tag_a['href'])

It is also possible to search all tags with soup.h3.findAll('a'), return will be a list of all searched tags.

2


Just fetch the tag h3 and then fetch the element a:

from bs4 import BeautifulSoup

data = """<a href="https://g1.globo.com">Globo</a>
<h3 class="b">
  <a href="https://www.google.com">Google</a>
</h3>"""

soup = BeautifulSoup(data)

div = soup.find('h3', class_='b')
a = div.find('a')
print a['href']
print a.text

Browser other questions tagged

You are not signed in. Login or sign up in order to post.