0
I have a mini bot made in Python that does the scraping of data from Direct Treasure, in short it logs into my account, goes in the statement takes this data and returns a json
, which within my project in Laravel I work with the data, however sometimes errors occur, due to my internet, or some other problem.
This is where would enter the print screen, when any error occurs would be taken a print screen, with this it would be easier to know if the error is some change in the site or some internal or external error.
I’ll just take one of the scripts to use as an example, in this case the tesouro_direto_extrato.py
, down the script:
# -*- coding: utf-8 -*-
# =========== IMPORTS ===========
from datetime import datetime
import dateutil.relativedelta
from time import sleep
import sys
from selenium import webdriver
from selenium.webdriver import FirefoxOptions
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import NoSuchElementException
# ===============================
print('[ {"inicio": "%s"},' % str(datetime.now()))
# necessario para funcionar remotamente
opts = FirefoxOptions()
opts.add_argument("--headless")
firefox = webdriver.Firefox(firefox_options=opts)
# ============================================
# parametros
user_login = sys.argv[1]
user_pass = sys.argv[2]
wait_time = 10
# =====================================
# PAGINA DE LOGIN
firefox.get('https://tesourodireto.bmfbovespa.com.br/portalinvestidor/')
# preenchendo formulario de login
login = WebDriverWait(firefox, wait_time).until(EC.presence_of_element_located((By.ID, 'BodyContent_txtLogin')))
password = WebDriverWait(firefox, wait_time).until(EC.presence_of_element_located((By.ID, 'BodyContent_txtSenha')))
login.send_keys("", user_login)
password.send_keys("", user_pass)
login_attempt = WebDriverWait(firefox, wait_time).until(EC.presence_of_element_located((By.ID, 'BodyContent_btnLogar')))
login_attempt.click()
# ====================================
# pagina de consulta
firefox.get('https://tesourodireto.bmfbovespa.com.br/portalinvestidor/extrato.aspx')
btn_consultar = WebDriverWait(firefox, wait_time).until(EC.presence_of_element_located((By.ID, 'BodyContent_btnConsultar')))
btn_consultar.click()
# =====================================
representantes = firefox.find_elements_by_xpath("//div[contains(@class, 'section-container')]")
# print(vars(representantes))
for representante in representantes:
nome_representante = representante.find_element_by_xpath('./section/p/a').text.split(' - ')
table_rows = representante.find_elements_by_xpath('./section/div/table/tbody/tr')
nome_representante = nome_representante[1]
for table_row in table_rows:
titulo = table_row.find_element_by_xpath('./td[1]').text
vencimento = datetime.strptime(table_row.find_element_by_xpath('./td[2]').text, '%d/%m/%Y')
valor_investido = (table_row.find_element_by_xpath('./td[3]').text).replace('.', '').replace(',','.')
valor_bruto_atual = (table_row.find_element_by_xpath('./td[4]').text).replace('.', '').replace(',','.')
valor_liquido_atual = (table_row.find_element_by_xpath('./td[5]').text).replace('.', '').replace(',','.')
quant_total = (table_row.find_element_by_xpath('./td[6]').text).replace(',', '.')
quant_bloqueado = (table_row.find_element_by_xpath('./td[7]').text).replace(',', '.')
print('{ "nome_representante": "%s", "titulo": "%s", "vencimento": "%s", "valor_investido": "%s", "valor_bruto_atual": "%s", "valor_liquido_atual": "%s", "quant_total": "%s", "quant_bloqueado": "%s" },' % (nome_representante, titulo, vencimento, valor_investido, valor_bruto_atual, valor_liquido_atual, quant_total, quant_bloqueado))
# Fechar navegador
firefox.quit()
print('{"fim": "%s"} ]' % str(datetime.now()))
Currently when an error occurs, Selenium naturally spits the error and I end up saving it as a log for queries, for example:
[ {"inicio": "2019-01-18 10:00:02.026618"},
Traceback (most recent call last):
File "/var/www/MoneyGuard/pythonGuard/tesouro_direto/tesouro_direto_extrato.py", line 32, in <module>
firefox.get('https://tesourodireto.bmfbovespa.com.br/portalinvestidor/')
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Reached error page: about:neterror?e=dnsNotFound&u=https%3A//tesourodireto.bmfbovespa.com.br/portalinvestidor/&c=UTF-8&f=regular&d=N%C3%A3o%20conseguimos%20conectar%20com%20o%20servidor%20em%20tesourodireto.bmfbovespa.com.br.
This error, for example I’m not sure if the error was a change in the URL or if my server could not resolve by internet account.
So I appreciate any and all help to improve this script, thank you.
Link to all scripts: https://github.com/bulfaitelo/Tesouro-Direto-Scraper
Just missed the question. You said what you want to do, but did not speak because you have not yet done it... What is your difficulty?
– nosklo