Reading accented CSV file in Python?

Asked

Viewed 3,327 times

5

I am learning python3 and ended up getting stuck in the question of reading a simple csv file that contains the character 'à'. I’ve tried using Code, I find that I found through the internet but nothing seems to work, it is always printed as ' xc3 xa0'. Remembering that I use the sublime to edit the code and run it.

import csv

with open('teste.csv', 'r') as ficheiro:
    reader = csv.reader(ficheiro, delimiter=';')
    for row in reader:
        print(row)

The test.csv file:

batata;14;True
pàtato;19;False
papa;10;False

The mistake:

    Traceback (most recent call last):
  File "/Users/Mine/Desktop/testando csv.py", line 5, in <module>
    for row in reader:
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 16: ordinal not in range(128)
[Finished in 0.1s with exit code 1]

I’m waiting for help.

1 answer

6

Depends on which encoding saved the file .csv

Note: in python 2 o csv only supports ASCII

UTF-8

If the file .csv is saved as UTF-8 can do as per python 3 documentation:

import csv

with open('teste.csv', encoding='utf-8') as f:
    reader = csv.reader(f, delimiter=';')
    for row in reader:
        print(row)

If the file . csv is not in UTF-8 an error similar to this will occur:

C:\Users\guilherme\Desktop>python testcsv.py
Traceback (most recent call last):
  File "testcsv.py", line 5, in <module>
    for row in reader:
  File "C:\Python\Python36-32\lib\codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 0: invalid
continuation byte

If it’s all right it’ll stay that way:

windows

If it is a problem in a terminal of a like-Unix environment (for example Mac and linux) apply this (I believe the document also has to be saved in UTF-8 NO GOOD):

# -*- coding: utf-8 -*-

import csv

with open('teste.csv', encoding='utf-8') as f:
    reader = csv.reader(f, delimiter=';')
    for row in reader:
        print(row)

Latin1

If the file is saved in ANSI, or latin1 or windows-1252 or iso-8859-1 (they are "compatible") can set encoding='latin-1' (although on Python3 on Windows it was not necessary), it should look like this:

import csv

with open('teste.csv', encoding='latin-1') as f:
    reader = csv.reader(f, delimiter=';')
    for row in reader:
        print(row)
  • I’ve tried it before and it’s still the same mistake.

  • @Lucasfabio updated the answer, see if it solves

  • Really this is strange not to be working, expimenta put also this suggestion Guilherme, http://ideone.com/D8LF9o not to use the module csv... I doubt it’s that but no Lucas loses nothing in trying

Browser other questions tagged

You are not signed in. Login or sign up in order to post.