5
am programming enthusiast. am trying to scrape comments from a news portal to make a cloud tag.
I’m trying to do this using Beautiful Soap but have gotten a None return. Follow the code I’m using. Any hints? My knowledge is very amateur, I am curious, not stout so maybe there is something site structure there that I do not know and should know. I managed to scrape the text but not the comments. Thank you
import urllib.request
from bs4 import BeautifulSoup
coments = 'https://esporte.uol.com.br/futebol/campeonatos/copa-do-
brasil/ultimas-noticias/2019/02/06/santos-toma-susto-mas-faz-7-a-1-no-altos-e-avanca-na-copa-do-brasil.htm'
page = urllib.request.urlopen(coments)
soup = BeautifulSoup(page, 'html5lib')
v = soup.find("div", {"id": "comentarios"})
print(v)
[None]
By the way, I recommend using the library
requests
or themechanicalsoup
in place aturllib.request
to make scraping. Amechanicalsoup
is much easier to use, but you won’t be able to scrap content from Javascript-based pages. Documentation from Mechanical Soup.– Rafael Barros
I get it. Too bad. Thank you very much. is that I read that the best way to learn the concepts of programming is by choosing something that I would like to do and I really like this analytical part of feelings and I think the news portals are the best place for a more genuine sampling of people’s reaction.
– Thiago F