1
I’m trying to get the speeches of the deputies, which can be found here. The site has several pages (1 to 300 +/-) and on each page has a table with a "summary" of the information, with 50 lines. Each line has a link that opens the full speech of Deputy X. What I’m trying to do: Save this table with the "summary" -> click on the speaker’s integral X -> save the speech’s integral X -> back to the previous page with the "summaries" -> click on the speech’s integral Y -> save -> back.... -> go to the next page and repeat the whole process to the last page.
For this I tried to use the following loop:
tabela=[]
html_element=[]
item=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50]
while True:
try:
for i in item:
if i < 50:
WebDriverWait(driver, 15).until(
EC.presence_of_element_located((By.XPATH, "//*[@id='content']/div/table/tbody")))
driver.find_elements_by_class_name("glyphicon.glyphicon-file")[i].click()
WebDriverWait(driver, 15).until(
EC.presence_of_element_located((By.ID, "content")))
html_element.append(driver.find_element_by_xpath("//*[@id='content']").get_attribute('outerHTML'))
driver.execute_script("window.history.go(-1)")
WebDriverWait(driver, 15).until(
EC.presence_of_element_located((By.XPATH, "//*[@id='content']/div/table/tbody")))
elif i == 50:
tabela.append(driver.find_element_by_xpath("//*[@id='content']/div/table"))
WebDriverWait(driver, 15).until(
EC.presence_of_element_located((By.XPATH, "//*[@id='content']/div/table/tbody")))
driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[@title='Próxima Página']"))))
driver.find_element_by_xpath("//*[@title='Próxima Página']").click()
print("Próxima página")
except (TimeoutException, WebDriverException) as e:
print("Última página")
break
It works partially, I can get the integral of the congressman’s speech. However I’m not getting to the next page or advance two, and there are times when it turns two pages which ends up in wrong data.