Read csv file using "|" - Python character as delimiter

Asked

Viewed 4,048 times

2

I tried to create a Dataframe with the pandas lib of a file that is sent to me in the following format:

--------------------------------
|Indice|Preço|Quantidade|Cidade|
--------------------------------
|1|1000|2|São Paulo|
.
.
.

I used the read_csv method with the delimiter "|" and I get the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 3: invalid continuation byte

I tried to use some other encodings but I couldn’t find the way to separate the data correctly. Nowadays I use excel to do this division and delete dashed lines (------).

I appreciate the help you can give me.

1 answer

2

I believe the error occurs because of the São Paulo’s. In Text fields, the value should be between ". But you can also try to define the encoding of the function. Follow example:

read_csv('arquivo.csv',encoding='iso-8859-1',delimiter ='|')

Job documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

Related issue: https://stackoverflow.com/questions/30462807/encoding-error-in-panda-read-csv

Note: If you open in excel to remove dashed lines, I imagine it is a manual operation and not an automatic routine. Try using the Notepad++ that can change text in multiple files at once, and replace these Pipes with a comma point. There is also the possibility to create macros to edit text automatically.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.