Web Scraping with Webdriver and python Selenium

Asked

Viewed 98 times

0

This program aims to download excel files to optimize time. But there are the filters to be filled before downloading the file, in these filters have the start date and end date, it must be the same date because the time cannot exceed 24 hours, then I’m having to manually change the two dates in the code before downloading each file and this is becoming repetitive.

Here’s the part where I change the dates:

#selecionar data
start_date = (By.ID, "ContentPlaceHolder1_contentFiltroPesquisa_txtDataIni")
element_start_date = WebDriverWait(driver, 20).until(EC.element_to_be_clickable(start_date))
element_start_date.clear()
element_start_date.send_keys('31/01/2021')

end_date = (By.ID, "ContentPlaceHolder1_contentFiltroPesquisa_txtDataFim")
element_end_date = WebDriverWait(driver, 20).until(EC.element_to_be_clickable(end_date))
element_end_date.clear()
element_end_date.send_keys('31/01/2021')

Here’s the code in full:

import time
import requests
import pandas as pd
import json
from selenium import webdriver
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import StaleElementReferenceException
from bs4 import BeautifulSoup

driver = webdriver.Chrome(executable_path=r'./chromedriver.exe')


## Endereço do site da coleta
url = "https://gool.cittati.com.br/Login.aspx?ReturnUrl=%2f"

driver.get(url)
#time.sleep(5)

## Logar 
login = driver.find_element_by_xpath("//div[@class='listaIcones']//ul//li//input[@id='ucTrocarModulo_btnIconeUrbano']")
login.click()

txt_username_locator = (By.ID, "ucLogarUsuario_txtLogin")       # colocando usuario
element_username = WebDriverWait(driver, 20).until(EC.element_to_be_clickable(txt_username_locator)) 
element_username.send_keys("###")

txt_passwoard_locator = (By.ID, "ucLogarUsuario_txtSenha")      # colocando senha
element_password = WebDriverWait(driver, 20).until(EC.element_to_be_clickable(txt_passwoard_locator))
element_password.send_keys("###")
element_password.send_keys(Keys.ENTER)

## Acessar o monitoramento:

monitoring = (By.NAME, "item_menu_1")
element_monitoring =  WebDriverWait(driver, 20).until(EC.element_to_be_clickable(monitoring))
element_monitoring.click()

## Acessar os relatorios e historico de eventos:

reports = (By.ID, "item_menu_2") 
element_reports = WebDriverWait(driver, 20).until(EC.element_to_be_clickable(reports))
element_reports.click()

event_history = (By.ID, "50204")
element_event_history = WebDriverWait(driver, 20).until(EC.element_to_be_clickable(event_history))
element_event_history.click()

## pesquisar eventos. 

#selecionar data
start_date = (By.ID, "ContentPlaceHolder1_contentFiltroPesquisa_txtDataIni")
element_start_date = WebDriverWait(driver, 20).until(EC.element_to_be_clickable(start_date))
element_start_date.clear()
element_start_date.send_keys('31/01/2021')

end_date = (By.ID, "ContentPlaceHolder1_contentFiltroPesquisa_txtDataFim")
element_end_date = WebDriverWait(driver, 20).until(EC.element_to_be_clickable(end_date))
element_end_date.clear()
element_end_date.send_keys('31/01/2021')

#selecionar tipo de eventos
select_event = (By.ID, "ContentPlaceHolder1_contentFiltroPesquisa_ddlEvento")
element_select_event = WebDriverWait(driver, 20).until(EC.element_to_be_clickable(select_event))
element_select_event.send_keys('Cumprimento de Viagem')

#selecionar todas as linhas
select_line = (By.ID, "ContentPlaceHolder1_contentFiltroPesquisa_chkSelevionarTodosLinhas")
element_select_line = WebDriverWait(driver, 20).until(EC.element_to_be_clickable(select_line))
element_select_line.click()

#baixar arquivo excel contendo os dados
document_excel = (By.ID, "ContentPlaceHolder1_contentFiltroPesquisa_btnExportarExcel")
element_document_excel = WebDriverWait(driver, 20).until(EC.element_to_be_clickable(document_excel))
element_document_excel.click()

#driver.quit()

I tried to make an accountant and change the dates, but I did not succeed. Could someone help me find a solution to automate this process so that it is not necessary to close the program change the date manually and run again.

  • Samuel, I don’t understand your doubt, please be more specific, what you wish to do and what is not working?

  • I want to download files referring to each day, but I don’t want to have to change the date and run the project every time, I would like to download all files of a month or even a year running the project only once.

1 answer

0


EDIT

You arrived insert the part to download inside the loop?

Hello, I have no score to comment on, so I have to put as an answer.

If I understand correctly, you want to develop a way to assemble the dates.

I thought of a loop for:

dias = list(range(1, 32))
meses = list(range(1, 13))
anos = list(range(2020, 2022))

for ano em anos:
    for mes in meses:
        for dia in dias:
            if mes == 2 and dia >= 28:
                break
            elif mes in [4, 6, 9, 11] and dia > 30:
                break
            data = f'{str(dia)}/{str(mes)}/{str(ano)}'
            start_date = (By.ID, 
            "ContentPlaceHolder1_contentFiltroPesquisa_txtDataIni")
            element_start_date = WebDriverWait(driver, 
            20).until(EC.element_to_be_clickable(start_date))
            element_start_date.clear()
            element_start_date.send_keys(data)
            element_end_date = WebDriverWait(driver, 
            20).until(EC.element_to_be_clickable(end_date))
            element_end_date.clear()
            element_end_date.send_keys(data)
            #selecionar tipo de eventos
            select_event = (By.ID, 
            "ContentPlaceHolder1_contentFiltroPesquisa_ddlEvento")
            element_select_event = WebDriverWait(driver, 
            20).until(EC.element_to_be_clickable(select_event))
            element_select_event.send_keys('Cumprimento de Viagem')

            #selecionar todas as linhas
            select_line = (By.ID, "ContentPlaceHolder1_contentFiltroPesquisa_chkSelevionarTodosLinhas")
            element_select_line = WebDriverWait(driver, 
            20).until(EC.element_to_be_clickable(select_line))
            element_select_line.click()

            #baixar arquivo excel contendo os dados
            document_excel = (By.ID, 
            "ContentPlaceHolder1_contentFiltroPesquisa_btnExportarExcel")
            element_document_excel = WebDriverWait(driver, 
            20).until(EC.element_to_be_clickable(document_excel))
            element_document_excel.click()

In the loop range you could enter the dates you need, place step or make a list of the days you want.

At the end if you can’t perform the search again, you can insert a.refresh() driver to refresh the page and redo the operation.

  • Thank you very much for your reply. I looped but the loop does not stop to download the file it goes until the last date of the loop and download only the file of the last date.

  • I’d like you to first download from day 01, after that back in the loop.

  • 1

    Yes, I put the download part inside the loop, but it only downloads when it goes through the loop and arrives at 31 which would be the last date. Then I made a list like you said, and I put a Sleep in the loop and it worked. thank you so much for your help!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.