scrape news portal comments

Asked

Viewed 73 times

5

am programming enthusiast. am trying to scrape comments from a news portal to make a cloud tag.

I’m trying to do this using Beautiful Soap but have gotten a None return. Follow the code I’m using. Any hints? My knowledge is very amateur, I am curious, not stout so maybe there is something site structure there that I do not know and should know. I managed to scrape the text but not the comments. Thank you

 import urllib.request
 from bs4 import BeautifulSoup
 coments = 'https://esporte.uol.com.br/futebol/campeonatos/copa-do- 
 brasil/ultimas-noticias/2019/02/06/santos-toma-susto-mas-faz-7-a-1-no-altos-e-avanca-na-copa-do-brasil.htm'
 page = urllib.request.urlopen(coments)
 soup = BeautifulSoup(page, 'html5lib')
 v = soup.find("div", {"id": "comentarios"})
 print(v)
 [None]

1 answer

3

With this code you could not capture any comment because the comments are in a tag <p class="comment-text ng-binding ng-scope" [...]>. Soon, you beautiful girl from soup.find should be:

v = soup.find_all("p", class_="comment-text ng-binding ng-scope"})

Here I use it find_all() to find all the comments and not only the first, what would happen if I used the find(). And I also use the attribute class_ and not class; That’s because in Python the word class is reserved, from there to use the Beautiful Soup we have to type class_, with an underscore at the end.

However, UOL’s comments are protected against web scraping. So, as you’re getting started, I recommend studying a little scraping with something simpler, understanding how tags work and how to find the content you need in them.

  • 1

    By the way, I recommend using the library requests or the mechanicalsoup in place at urllib.request to make scraping. A mechanicalsoup is much easier to use, but you won’t be able to scrap content from Javascript-based pages. Documentation from Mechanical Soup.

  • I get it. Too bad. Thank you very much. is that I read that the best way to learn the concepts of programming is by choosing something that I would like to do and I really like this analytical part of feelings and I think the news portals are the best place for a more genuine sampling of people’s reaction.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.