0
Good afternoon guys. I’m having a problem with requests for data scraping.
The main function is the one that follows:
def raspa_dados(lista_de_links, ministerio):
links = []
autores = []
chamadas = []
datas = []
ementas = []
for i in lista_de_links:
time.sleep(0.01)
html = trata_html(requests.get(i).text)
soup = BeautifulSoup(html, 'lxml')
orgao = soup.findAll('span', class_="orgao-dou-data")[0].get_text()
chamada = soup.findAll('p', class_="identifica")[0].get_text()
data = soup.findAll('span', class_="publicado-dou-data")[0].get_text()
ementa = soup.findAll('p', class_="ementa")
if ministerio in orgao:
if chamada.split()[0] in tipos or chamada.split()[0] in primeira_maiuscula(tipos):
autores.append(orgao)
chamadas.append(chamada)
datas.append(data)
links.append(i)
if len(ementa) >= 1:
ementas.append(ementa[0].get_text())
else:
ementas.append(None)
return links, autores, chamadas, datas, ementas
Recently I added this line from time.sleep(0.01)
in the hope of solving the problem.
I keep getting the following error message:
ConnectionResetError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/urllib3/response.py in _error_catcher(self)
361 try:
--> 362 yield
363
18 frames
ConnectionResetError: [Errno 104] Connection reset by peer
During handling of the above exception, another exception occurred:
ProtocolError Traceback (most recent call last)
ProtocolError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))
During handling of the above exception, another exception occurred:
ChunkedEncodingError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/requests/models.py in generate()
752 yield chunk
753 except ProtocolError as e:
--> 754 raise ChunkedEncodingError(e)
755 except DecodeError as e:
756 raise ContentDecodingError(e)
ChunkedEncodingError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))
Can someone tell me what’s wrong?
I would test a few things to better identify the problem. If you pass only one value in the link list, does your webscrapper work? If it works by passing only one value, it prints which link it is from the list, and finds out if it is always the same link that is causing the error. If it is not always the same link that is causing error, I would put your except in the code to try more than once make the connection to site and if it fails 2-3 times it simply continue with the rest and let me know at the end which site it failed to.
– renatomt
with a value only gives the same error
– Marcelo
Then, the error may be in the trata_html function or in the website you are trying to make Scrap. Stackoverflow in English explaining what the Connection reset by peer means.
– renatomt