How to get the headlines of the Olympics on the CNN website with Python using Beautifulsoup?

Asked

Viewed 224 times

0

1 answer

3


The question is how to look at the returned html of get request and identify what it wants, in this case we want all the <span> who have the class cd__headline-text, I assume with headlines you mean that. You can do it like this:

from bs4 import BeautifulSoup as bs4
import requests as r

req = r.get('http://edition.cnn.com/sport/olympics')
soup = bs4(req.text, 'html.parser') # req.text = html retornado
manchetes_html = soup.findAll('span', {'class': 'cd__headline-text'}) # aqui vamos procurar no html por aquilo que eu disse acima, e teremos uma lista de todos os eles que correspondam a procura
manchetes = '' # nossa futura string the manchetes
for manchete in manchetes_html:
    manchetes += '{}\n'.format(manchete.text)
print(manchetes)

DEMONSTRATION

  • thanks for the reply!

  • 1

    No @Eds, the most important thing is to try to understand what’s going on, then it’s always the same. I’m glad I helped

Browser other questions tagged

You are not signed in. Login or sign up in order to post.