Webscrapping Soup + python export to txt and check with shell script

Asked

Viewed 78 times

0

Greetings people, I’m here with a python code that brings me the milliseconds of the E-tax Note Sending ping from the E-tax portal in the NFC-e status portal as below :

#!/usr/bin/env python

# -*- coding: UTF-8 -*-
import requests, lxml.html
from bs4 import BeautifulSoup
resp = requests.get('http://www.nfce.se.gov.br/portal/ConStatusAuto?Origem=1')
doc = lxml.html.fromstring(resp.text)
for tr in doc.xpath('//tr'):
    nome = tr[0].text_content().strip()
    print(nome.ljust(25), '|'.join('{: >7}'.format(td.text_content().strip())
        for td in tr.xpath('.//td')[1:]))

So far so good, what I’m in doubt now and not able to do is to export the output script data to a txt file of query name.txt, and a shell script that checks if the first ping is above 1s to return me the integer value 1 or if it is below 1s to return me the integer value 0. Below is the output of the script I need the shell to see :

:/usr/src/zabbixbot# ./sefaz.py
('SEFAZ Amazonas           ', '       |  530ms|  900ms|  415ms|    0ms|  900ms|  661ms|    0ms|  806ms|   50ms|  530ms|  859ms|  676ms|    0ms|    0ms|    0ms')
(u'SEFAZ S\xe3o Paulo          ', '       |    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms')
(u'SEFAZ Paran\xe1             ', '       |  354ms|1s808ms|  551ms|  433ms|1s808ms|  620ms|  429ms|  799ms|  495ms|  354ms|  880ms|  507ms|    0ms|    0ms|    0ms')
('SEFAZ Goias              ', '       |    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms')
('SEFAZ Mato Grosso        ', '       |  503ms|  1s4ms|  317ms|  609ms|  861ms|  301ms|    0ms|  896ms|  325ms|  503ms|  1s4ms|  341ms|    0ms|    0ms|    0ms')
('SEFAZ Rio Grande do Sul  ', '       |  844ms|1s886ms|  974ms|  852ms|1s166ms|  964ms|  871ms|1s886ms|  986ms|  844ms| 1s79ms|  969ms|    0ms|    0ms|    0ms')
('SEFAZ Virtual RS         ', '       |  761ms|5s854ms|  1s2ms|  830ms|1s245ms|  964ms|  761ms|5s854ms| 1s65ms|  837ms|1s329ms|  942ms|    0ms|    0ms|    0ms')
('SEFAZ Mato Grosso do Sul ', '       |    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms')
(u'SEFAZ Cear\xe1              ', '       |    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms')
('SEFAZ Minas Gerais       ', '       |    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms')
('SEFAZ Pernambuco         ', '       |    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms|    0ms')

In case I just need to check the SEFAZ Virtual RS that is the authorizer of my state, the others can be disregarded and the first ping is soon the first that is found after the name SEFAZ Virtual RS, ie in this case 761ms

1 answer

1


First to get the first ping only of Virtual RS, just do an if

for tr in doc.xpath('//tr'):
    nome = tr[0].text_content().strip()
    if nome == 'SEFAZ Virtual RS':
        valor_ping = tr[2].text_content().strip()
        break # achou, sai do for

Then you just check:

valor_ping = valor_ping[:-2] # retira o `ms` que tem no final
if 's' in valor_ping:  # se tiver a letra `s` é mais que 1 segundo
     print(1)
else:
     print(0)
  • saved my life friend, mt thank you, he returned the 0 correctly, however it is also printing all authorizers and pings, it has to return only the value 1 or 0 ?

  • Remove the others prints so, @Rafaelxaviersuarez - you are the programmer, try to modify the code to do what you want

Browser other questions tagged

You are not signed in. Login or sign up in order to post.