Webscraping of pictures in comments

Asked

Viewed 45 times

0

I’m working on a web scraper that needs to redeem comments in a forum that allows the upload images. The text and author of the commentary was able to obtain using a findAll in Beautiful Soup, but I couldn’t get a way to save the links associated with the comments (not all comments have images to get a link)

The code section has how I got the comments and how I tried to get the link images

title_comentou = container.findAll("div",{"class":"posting fullpost"})
comentario = title_comentou[0].text

title_imgem_link = container.findAll("div",{"img":"src"})
linkado = title_imgem_link[0].text

getting that error:

Traceback (most recent call last):
  File "2 - localBotBS4.py", line 54, in <module>
    linkado = title_imgem_link[0].text
IndexError: list index out of range
  • It seems that title_imgem_link does not have the value you expect to have. You have already verified what is the value of it?

  • I don’t understand any of this beautifulsoup, but I think the problem is in your assignment, you are assigned the value returned by the method to a simple variable and not to a list. Why this error occurs: IndexError: list index out of range. The right thing would be:title_comentou.append(container.findAll("div",{"class":"posting fullpost"}))

  • Have you seen the find_all documentation? When you do container.findAll("div",{"img":"src"}) you are looking for a div with the attribute img being of value src, that is, you are looking for <div img="src">... I suppose that’s not what you want...

  • without having the structure of the page has not to say much

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.