Switching pages in an html table with beautifulsoup

Asked

Viewed 220 times

0

I’m collecting the data on this one website, using requests and beautifulsoup. I was able to collect all the data from page 1, but I cannot change the page.


Python code

   variaveis = []
   df_list = []
   for i in range(1,3):
       print('rodada')
       print(i)
       url = 'https://www.cartolafcbrasil.com.br/scouts/cartola-fc-2018/rodada-' + str(i)
       page = requests.get(url)
       soup = BeautifulSoup(page.text, 'html.parser')
       table = soup.find_all('table')[0]
       df = pd.read_html(str(table), encoding="UTF-8")
       df_list.append(df)
    print(df)

HTML

<tr class="tbpaging">
   <td colspan="25">
      <table border="0">
<tr>
   <td>
      <span>
      1
      </span>
   </td>
   <td>
      <a href="javascript:__doPostBack('ctl00$cphMainContent$gvList','Page$2')">
      2
      </a>
   </td>
   <td>
      <a href="javascript:__doPostBack('ctl00$cphMainContent$gvList','Page$3')">
      3
      </a>
   </td>

1 answer

0

Problem

The pagination used in the results list depends on the execution of a javascript function and Beautifulsoup does not perform javascript, so you will not be able to access data obtained by javascript.

Solution

You can use a full rendering engine, in Python you can use the Selenium, so you will be able to simulate the click on the page and the javascript will be executed ensuring access to the data you need.

  • Soup.find_element_by_xpath("(//a[contains(.,'2')])[2]"). click() 'Nonetype' Object is not callable

  • @Pedro with Soup you won’t be able to run javascript.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.