Picking up texts within a column within a Python table

Asked

Viewed 371 times

-1

I always use variations of this code in other tables and it usually works, but this one I try anyway and it doesn’t work, what I’m doing wrong?

    url = "https://www.reddit.com/r/movies/comments/hhfalb/what_is_the_best_film_you_watched_last_week/"
    req = requests.get(url)
    soup = BeautifulSoup(req.content, 'html.parser')
    tabela = soup.findAll('table', class_='MRH-njmSb5ZTkfb1o4dqv') 
    colunas = tabela.findAll('td')
    print(tabela)



<table class="MRH-njmSb5ZTkfb1o4dqv"><thead>  {....}

{....}  <td class="_1LHijgw3WoeCUe8AUewfUB"><a href="https://www.reddit.com/r/movies/comments/hd7rzi/what_is_the_best_film_you_watched_last_week/fvluiyn/" class="_3t5uN8xUmg0TOwRCOGQEcU" rel="noopener nofollow ugc" target="_blank">&quot;Miss Juneteenth&quot;</a></td>   {.....}

In the first attempts I was able to print the table code, but now not even that. Usually the result gives [] (empty), or (None) in the log.

I have tried Soup.find, Soup.findAll, nothing. As you would do?

1 answer

0


I tested what you were doing and when I looked at the status of the requests gave a 502 status which means error.

print (req.status_code)
502

Then I did a test with Selenium and using Webdriver and it worked:

from bs4 import BeautifulSoup
from selenium import webdriver

#Carregando o webdriver
driver = webdriver.Chrome("/usr/bin/chromedriver")
url = "https://www.reddit.com/r/movies/comments/hhfalb/what_is_the_best_film_you_watched_last_week/"
driver.get(url)
req = driver.page_source
soup = BeautifulSoup(req, 'html.parser')
tabela = soup.find('table', attrs={"class":"MRH-njmSb5ZTkfb1o4dqv"}) 
colunas = tabela.find_all('td')
print(tabela)
  • So I wanted to avoid using Selenium because it would be a bit heavy (it takes a long time to install on the server for example). I believe that the problem is generated precisely because the page was created by javascript or something like that, why it worked with Selenium. I guess there’s no other way with normal Beautifulsoup, huh? Anyway I found another solution in the gambiarra, I got the table through the feed/ rss of the Reddit. It worked great.Thanks for the solution anyway!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.