Problem Reading Unicode python file

Question

Problem Reading Unicode python file

Asked 5 years, 6 months ago

Viewed 116 times

0

Guys I got my script :

import sys
search = sys.argv[1]
ref_arquivo = open('C:/Zabbix/RelatorioErros.txt','r').readlines()[11:]
for line in ref_arquivo:
    if search in line:
        print(line[30:66],line[66:77],line[92:99],line[100:110])

it works only on UTF-8 files but when running it on a machine windowns when reading Reporterrors.txt does not work because the form of txt is in Unicode what to do ?

Are you sure this is the error? Could post error message here?

– G. Bittencourt

2020/01/22 at 03:16
To function open has the parameter encoding that you can define which encoding is used in reading the file; by default it is UTF-8.

– Woss

2020/01/22 at 11:06
1

@Woss default is not always UTF-8; of Docs, In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.. That assumption that it’s UTF-8 by default has bit me too!

– Pedro von Hertwig Batista

2020/01/22 at 12:17
@Pedrovonhertwigbatista Well remembered.

– Woss

2020/01/22 at 13:24

1 answer

Browser other questions tagged python python-3.x utf-8 unicode

You are not signed in. Login or sign up in order to post.

by jsbueno • **30,668** points · Answer 1 · 2020-01-23T17:31:21+00:00

If the file is in utf-8 itself, just state this explicitly when opening the file in Windows. Otherwise Python will use the default encoding system, which in case is latin-1, and the contents of the file will get corrupted in memory (each character outside the ASCII range, including all accented, will turn 2 or more other characters nothing to see).

In case, just do:

ref_arquivo = open('C:/Zabbix/RelatorioErros.txt','r', encoding="utf-8").readlines()[11:]

(the other party involving accentuation - reading of sys.argv[1] should be treated automatically in Python 3 - it will transform from the encoding used in the terminal to text).