0
I’d like an example of how to take the headlines of the Olympics in http://edition.cnn.com/sport/olympics
using Beautifulsoup.
0
I’d like an example of how to take the headlines of the Olympics in http://edition.cnn.com/sport/olympics
using Beautifulsoup.
3
The question is how to look at the returned html of get request and identify what it wants, in this case we want all the <span>
who have the class cd__headline-text
, I assume with headlines you mean that. You can do it like this:
from bs4 import BeautifulSoup as bs4
import requests as r
req = r.get('http://edition.cnn.com/sport/olympics')
soup = bs4(req.text, 'html.parser') # req.text = html retornado
manchetes_html = soup.findAll('span', {'class': 'cd__headline-text'}) # aqui vamos procurar no html por aquilo que eu disse acima, e teremos uma lista de todos os eles que correspondam a procura
manchetes = '' # nossa futura string the manchetes
for manchete in manchetes_html:
manchetes += '{}\n'.format(manchete.text)
print(manchetes)
Browser other questions tagged python web-scraping
You are not signed in. Login or sign up in order to post.
thanks for the reply!
– Ed S
No @Eds, the most important thing is to try to understand what’s going on, then it’s always the same. I’m glad I helped
– Miguel