Unicodedecodeerror: 'utf-8' codec can’t Decode byte 0xc3 in position xx: Unexpected end of data

Asked

Viewed 127 times

0

I am unable to open a block-separated CSV file (spaces/tab) in Python.

The file has 1.7 million lines and is in UTF-8. From what I searched in the Stack in English it is as if the character '0xc3' does not belong to UTF-8.

import chardet
import pandas as pd
with open('/content/gdrive/My Drive/Colab Notebooks/arquivo.csv', 'rb') as f:
    result = chardet.detect(f.read())
df = pd.read_csv('/content/gdrive/My Drive/Colab Notebooks/arquivo.csv', sep='  ', encoding=result['encoding'])

Does anyone have any idea how to fix this? Like, I probably have a file that should only have UTF-8 characters, but this strange character arises.

  • In fact the file is UTF-8. But, the error remains, so one output I used now and worked out was to change the "UTF-8" to "ISO-8859-1". Although not the "correct" solution is an alternative to error. Source: https://stackoverflow.com/questions/18171739/unicodedecodeerror-when-reading-csv-file-in-pandas-with-python

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.