How to get the headlines of the Olympics on the CNN website with Python using Beautifulsoup?

Question

How to get the headlines of the Olympics on the CNN website with Python using Beautifulsoup?

Asked 8 years, 12 months ago

Viewed 224 times

0

I’d like an example of how to take the headlines of the Olympics in http://edition.cnn.com/sport/olympics

using Beautifulsoup.

1 answer

Browser other questions tagged python web-scraping

You are not signed in. Login or sign up in order to post.

by Miguel • **29,306** points · Answer 1 · 2016-08-07T14:32:51+00:00

The question is how to look at the returned html of get request and identify what it wants, in this case we want all the <span> who have the class cd__headline-text, I assume with headlines you mean that. You can do it like this:

from bs4 import BeautifulSoup as bs4
import requests as r

req = r.get('http://edition.cnn.com/sport/olympics')
soup = bs4(req.text, 'html.parser') # req.text = html retornado
manchetes_html = soup.findAll('span', {'class': 'cd__headline-text'}) # aqui vamos procurar no html por aquilo que eu disse acima, e teremos uma lista de todos os eles que correspondam a procura
manchetes = '' # nossa futura string the manchetes
for manchete in manchetes_html:
    manchetes += '{}\n'.format(manchete.text)
print(manchetes)

DEMONSTRATION