Picking up texts within a column within a Python table

Question

Picking up texts within a column within a Python table

Asked 5 years ago

Viewed 371 times

-1

I always use variations of this code in other tables and it usually works, but this one I try anyway and it doesn’t work, what I’m doing wrong?

    url = "https://www.reddit.com/r/movies/comments/hhfalb/what_is_the_best_film_you_watched_last_week/"
    req = requests.get(url)
    soup = BeautifulSoup(req.content, 'html.parser')
    tabela = soup.findAll('table', class_='MRH-njmSb5ZTkfb1o4dqv') 
    colunas = tabela.findAll('td')
    print(tabela)



<table class="MRH-njmSb5ZTkfb1o4dqv"><thead>  {....}

{....}  <td class="_1LHijgw3WoeCUe8AUewfUB"><a href="https://www.reddit.com/r/movies/comments/hd7rzi/what_is_the_best_film_you_watched_last_week/fvluiyn/" class="_3t5uN8xUmg0TOwRCOGQEcU" rel="noopener nofollow ugc" target="_blank">&quot;Miss Juneteenth&quot;</a></td>   {.....}

In the first attempts I was able to print the table code, but now not even that. Usually the result gives [] (empty), or (None) in the log.

I have tried Soup.find, Soup.findAll, nothing. As you would do?

1 answer

Browser other questions tagged python table web-scraping beautifulsoup

You are not signed in. Login or sign up in order to post.

by Juan Caio • **155** points · Answer 1 · 2020-07-06T20:12:35+00:00

I tested what you were doing and when I looked at the status of the requests gave a 502 status which means error.

print (req.status_code)
502

Then I did a test with Selenium and using Webdriver and it worked:

from bs4 import BeautifulSoup
from selenium import webdriver

#Carregando o webdriver
driver = webdriver.Chrome("/usr/bin/chromedriver")
url = "https://www.reddit.com/r/movies/comments/hhfalb/what_is_the_best_film_you_watched_last_week/"
driver.get(url)
req = driver.page_source
soup = BeautifulSoup(req, 'html.parser')
tabela = soup.find('table', attrs={"class":"MRH-njmSb5ZTkfb1o4dqv"}) 
colunas = tabela.find_all('td')
print(tabela)

Selenium