Unicodedecorror: 'utf-8'

Question

Unicodedecorror: 'utf-8'

Asked 9 years, 2 months ago

Viewed 2,514 times

5

I’m having problems with Unicodedecodeerror: 'utf-8' in a python file and I’m not being able to solve it. That is the mistake:

Traceback (most recent call last):
  File "file.py", line 448, in <module>
    fileOriginal.sliceFile(url) #Separa os arquivos para evitar MemoryError
  File "file.py", line 188, in sliceFile
    line = fileOriginal.readline()

  File "C:\Python34\lib\codecs.py", line 313, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: invalid
continuation byte

It occurs at the time of reading a txt file. The file is encoded with UTF-8 without BOM. And I don’t understand why it gives this error. The error occurs in the following line: "line = fileOriginal.readline()", according to the following code:

Code:

for(path, dirs, files) in os.walk(url): 
        contDec = 0 #Conta as declarações  
        contTempFiles = 0 #Conta os arquivo temporários                                            

        for file in files:                                                
            fileOriginal = open(os.path.join(url,file),encoding = "utf8")                                             

            endFile   = False
            contLines = 0
            contDec = 0
            cont    = 0
            line = ''
            while not 'ZZZZZ|' in line:                                     
                if cont == 0:
                    contTempFiles += 1                        

                    tempFile = open(os.path.join('separados',str(contTempFiles)+'_'+str(self.getFileName(file))+'.txt'),'w', encoding='utf-8')                                                
                line = fileOriginal.readline()#Erro nessa linha                                                
                if line[0:5] == '99999':
                    tempFile.write(line)
                    contDec += 1                                                                        
                if contDec <= 200000:                                                
                    tempFile.write(line)                        
                    cont += 1
                else:
                    contDec = 0
                    cont = 0
                    tempFile.close()                             
            fileOriginal.close()

Version of Python: 3.4.0 Can someone help me with this? Thank you!

Cocola the error in the question, will facilitate help.

– Ricardo

2016/05/23 at 21:11
I added the error Ricardo

– fredsilva

2016/05/24 at 11:30
I made this mistake at the time of generating logs on the system, in my case the linux variables were not set, but it was in writing, one thing that may be happening is that you are reading the files that have accents, try to read a file only with a simple text and one with accents. Then put the result.

– Ricardo

2016/05/24 at 11:32

2 answers

1

Instead of the code line:

open(os.path.join(url,file), encoding = "utf8")

Try putting the following:

path = os.path.join(url,file).decode("utf8")
open(path, encoding = "utf-8")

Do not forget to also put at the beginning of the code:

# -*- coding: utf-8 -*-

Oi Rui, what’s the difference?

– Anthony Accioly

2016/05/24 at 10:27
1

@Anthonyaccioly the only difference is the bet: utf8 to utf-8

– Rui Lima

2016/05/24 at 10:31
I also wondered what the difference would be, because this code was working normally.

– fredsilva

2016/05/24 at 11:22
@Ruilima, I did what you suggested, but the error persists.

– fredsilva

2016/05/24 at 11:31
@fredsilva I edited my answer, it may be that the path to the file has special characters. Try now.

– Rui Lima

2016/05/24 at 11:50

Browser other questions tagged python python-3.x

You are not signed in. Login or sign up in order to post.

by fredsilva • **358** points · Answer 1 · 2016-05-24T13:13:19+00:00

I managed to fix it. Grabbing the hook from @Rui Lima’s reply. Replace the line:

fileOriginal = open(os.path.join(url,file), encoding = "utf8")

for:

fileOriginal = open(url+file,encoding = "utf-8")

I don’t know why Join would make the mistake. Thank you all for your help! Forte Abraço!