Chunkedencodingerror when making requests in python

Asked

Viewed 115 times

0

Good afternoon guys. I’m having a problem with requests for data scraping.

The main function is the one that follows:

def raspa_dados(lista_de_links, ministerio):

    links = []
    autores = []
    chamadas = []
    datas = []
    ementas = []
    for i in lista_de_links:
        time.sleep(0.01)
        html = trata_html(requests.get(i).text)
        soup = BeautifulSoup(html, 'lxml')
        orgao = soup.findAll('span', class_="orgao-dou-data")[0].get_text()
        chamada = soup.findAll('p', class_="identifica")[0].get_text()
        data = soup.findAll('span', class_="publicado-dou-data")[0].get_text()
        ementa = soup.findAll('p', class_="ementa")
        if ministerio in orgao:
            if chamada.split()[0] in tipos or chamada.split()[0] in primeira_maiuscula(tipos):
                autores.append(orgao)
                chamadas.append(chamada)
                datas.append(data)
                links.append(i)
                if len(ementa) >= 1:
                  ementas.append(ementa[0].get_text())
                else:
                  ementas.append(None)
    return links, autores, chamadas, datas, ementas

Recently I added this line from time.sleep(0.01) in the hope of solving the problem.

I keep getting the following error message:

ConnectionResetError                      Traceback (most recent call last)

/usr/local/lib/python3.6/dist-packages/urllib3/response.py in _error_catcher(self)
    361             try:
--> 362                 yield
    363 

18 frames

ConnectionResetError: [Errno 104] Connection reset by peer


During handling of the above exception, another exception occurred:

ProtocolError                             Traceback (most recent call last)

ProtocolError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))


During handling of the above exception, another exception occurred:

ChunkedEncodingError                      Traceback (most recent call last)

/usr/local/lib/python3.6/dist-packages/requests/models.py in generate()
    752                         yield chunk
    753                 except ProtocolError as e:
--> 754                     raise ChunkedEncodingError(e)
    755                 except DecodeError as e:
    756                     raise ContentDecodingError(e)

ChunkedEncodingError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))

Can someone tell me what’s wrong?

  • I would test a few things to better identify the problem. If you pass only one value in the link list, does your webscrapper work? If it works by passing only one value, it prints which link it is from the list, and finds out if it is always the same link that is causing the error. If it is not always the same link that is causing error, I would put your except in the code to try more than once make the connection to site and if it fails 2-3 times it simply continue with the rest and let me know at the end which site it failed to.

  • with a value only gives the same error

  • Then, the error may be in the trata_html function or in the website you are trying to make Scrap. Stackoverflow in English explaining what the Connection reset by peer means.

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.