Unicodedecorror: 'utf-8'

Asked

Viewed 2,514 times

5

I’m having problems with Unicodedecodeerror: 'utf-8' in a python file and I’m not being able to solve it. That is the mistake:

Traceback (most recent call last):
  File "file.py", line 448, in <module>
    fileOriginal.sliceFile(url) #Separa os arquivos para evitar MemoryError
  File "file.py", line 188, in sliceFile
    line = fileOriginal.readline()

  File "C:\Python34\lib\codecs.py", line 313, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: invalid
continuation byte

It occurs at the time of reading a txt file. The file is encoded with UTF-8 without BOM. And I don’t understand why it gives this error. The error occurs in the following line: "line = fileOriginal.readline()", according to the following code:

Code:

for(path, dirs, files) in os.walk(url): 
        contDec = 0 #Conta as declarações  
        contTempFiles = 0 #Conta os arquivo temporários                                            

        for file in files:                                                
            fileOriginal = open(os.path.join(url,file),encoding = "utf8")                                             

            endFile   = False
            contLines = 0
            contDec = 0
            cont    = 0
            line = ''
            while not 'ZZZZZ|' in line:                                     
                if cont == 0:
                    contTempFiles += 1                        

                    tempFile = open(os.path.join('separados',str(contTempFiles)+'_'+str(self.getFileName(file))+'.txt'),'w', encoding='utf-8')                                                
                line = fileOriginal.readline()#Erro nessa linha                                                
                if line[0:5] == '99999':
                    tempFile.write(line)
                    contDec += 1                                                                        
                if contDec <= 200000:                                                
                    tempFile.write(line)                        
                    cont += 1
                else:
                    contDec = 0
                    cont = 0
                    tempFile.close()                             
            fileOriginal.close()

Version of Python: 3.4.0 Can someone help me with this? Thank you!

  • Cocola the error in the question, will facilitate help.

  • I added the error Ricardo

  • I made this mistake at the time of generating logs on the system, in my case the linux variables were not set, but it was in writing, one thing that may be happening is that you are reading the files that have accents, try to read a file only with a simple text and one with accents. Then put the result.

2 answers

1


Instead of the code line:

open(os.path.join(url,file), encoding = "utf8")

Try putting the following:

path = os.path.join(url,file).decode("utf8")
open(path, encoding = "utf-8")

Do not forget to also put at the beginning of the code:

# -*- coding: utf-8 -*-
  • Oi Rui, what’s the difference?

  • 1

    @Anthonyaccioly the only difference is the bet: utf8 to utf-8

  • I also wondered what the difference would be, because this code was working normally.

  • @Ruilima, I did what you suggested, but the error persists.

  • @fredsilva I edited my answer, it may be that the path to the file has special characters. Try now.

1

I managed to fix it. Grabbing the hook from @Rui Lima’s reply. Replace the line:

fileOriginal = open(os.path.join(url,file), encoding = "utf8")

for:

fileOriginal = open(url+file,encoding = "utf-8")

I don’t know why Join would make the mistake. Thank you all for your help! Forte Abraço!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.