scrape news portal comments

Question

scrape news portal comments

Asked 6 years, 4 months ago

Viewed 73 times

5

am programming enthusiast. am trying to scrape comments from a news portal to make a cloud tag.

I’m trying to do this using Beautiful Soap but have gotten a None return. Follow the code I’m using. Any hints? My knowledge is very amateur, I am curious, not stout so maybe there is something site structure there that I do not know and should know. I managed to scrape the text but not the comments. Thank you

 import urllib.request
 from bs4 import BeautifulSoup
 coments = 'https://esporte.uol.com.br/futebol/campeonatos/copa-do- 
 brasil/ultimas-noticias/2019/02/06/santos-toma-susto-mas-faz-7-a-1-no-altos-e-avanca-na-copa-do-brasil.htm'
 page = urllib.request.urlopen(coments)
 soup = BeautifulSoup(page, 'html5lib')
 v = soup.find("div", {"id": "comentarios"})
 print(v)
 [None]

1 answer

Browser other questions tagged python

You are not signed in. Login or sign up in order to post.

by Rafael Barros • **840** points · Answer 1 · 2019-02-25T00:47:05+00:00

With this code you could not capture any comment because the comments are in a tag <p class="comment-text ng-binding ng-scope" [...]>. Soon, you beautiful girl from soup.find should be:

v = soup.find_all("p", class_="comment-text ng-binding ng-scope"})

Here I use it find_all() to find all the comments and not only the first, what would happen if I used the find(). And I also use the attribute class_ and not class; That’s because in Python the word class is reserved, from there to use the Beautiful Soup we have to type class_, with an underscore at the end.

However, UOL’s comments are protected against web scraping. So, as you’re getting started, I recommend studying a little scraping with something simpler, understanding how tags work and how to find the content you need in them.