Encoding problem while extracting zip file - edited

Asked

Viewed 179 times

1

A webhook calls my API by sending a POST request. On the request body contains the url of a ZIP file.

Using the requests library, I perform a GET at the file url. I need to extract this few files from this zip and carry out a number of processes. The problem is that while trying to extract the file I come across the following error message: The following error occurs:

'ascii' codec can't encode character '\\xa2' in position 45: ordinal not in range(128)

Request code and attempt to extract the file:

import io
import requests
from zipfile import ZipFile

response = requests.get(url)

with ZipFile(io.BytesIO(response.content)) as thezip: # respose.content = arquivo zip em bytes por isso usei io.BytesIO()
    thezip.extractall()

When I print out the list of file names:

with ZipFile(io.BytesIO(response.content)) as thezip:
    print(thezip.namelist())
['Nao_Consistido/', 'Nao_Consistido/Relat\xc2\xa2rio de Previs\xc3\x86o de Vaz\xc3\x86o - Limite Inferior - LI.xls', 'Nao_Consistido/Relat\xc2\xa2rio de Previs\xc3\x86o de Vaz\xc3\x86o - Limite Superior - LS.xls', 'Nao_Consistido/Relat\xc2\xa2rio_de_Previs\xc3\x86o de Vaz\xc3\xa4es_PMO_de_DEZEMBRO_2019-preliminar.xls', 'Nao_Consistido/Todos_LI.prv', 'Nao_Consistido/Todos_LS.prv', 'Nao_Consistido/Todos_VE.prv']

I already set the PYTHONIOENCODING environment variable to utf-8 and it didn’t work. EDIT: After some tests I realized that the problem occurs only on the server (linux system), locally on Windows 10 does not occur.

  • 1

    And what would be the "list of file names"?

  • The . namelist() method lists the name of the files inside the zip

  • What a method namelist? There is none of this in the code of your question. I could check if you posted the full code?

  • I edited the question.

  • Gustavo, I still can not understand your problem, but you could explain why you are using i.BytesIO to spend a path for ZipFile? 'Cause you don’t pass a string normal? Or, what is the content of response.content? \xa2 is the character ¢, if your file name has this character may be the source of your problem. Anyway, they are suggestions to improve your question, the way it is now difficult to correctly define your problem.

  • I use the i.BytesIO because Response.content returns the bytes file to me

  • But with Bytesio you will keep having bytes... I advise you to convert your bytes into string, you can use str(response.content', encoding='utf-8') (use the encoding correct of your request #Docs).

  • While trying to convert get a similar error: UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-12: ordinal not in range(128)

Show 3 more comments

2 answers

0

One solution is to use Encode and Decode:

import io
import requests
from zipfile import ZipFile

response = requests.get(url)

for i in sys.argv[1:]:
    with ZipFile(io.BytesIO(response.content)) as thezip:
        for i in thezip.namelist():
                n = Path(i.encode('cp437').decode(encoding))
                if 1:
                    print(n)
                if i[-1] == '/':
                    if not n.exists():
                        n.mkdir()
                else:
                    with n.open('wb') as w:
                        w.write(thezip.read(i))
  • I believe part of the code is missing, some variables used are not defined.

  • The error has changed to: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2 in position 20: invalid start byte

-1

In python, as in other languages, interprets \ as a String escape, try to separate the path from the zip file with /.

  • Could you give an example? I should separate the path before extracting?

  • A direct path - C:/Users/Name/Desktop/File.zip , this module requests it captures the past path and makes the separation of each directory using the character separation condition / up to the desired file.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.