error while opening . csv file with/ python/ pandas

Asked

Viewed 203 times

0

I am new to the language and I am using Python 3 in jupternotebook inside anaconda. I followed the steps below. But it’s making a mistake I can’t decipher, please help me

setting the work directory

os.chdir('C:/Users/FUNDEPAG/Desktop/caracterizacao/socioecoomica')
print( os.getcwd())

checking the files

os.listdir()
['captura_especie.csv',
 'captura_especie_ponto.csv',
 'captura_especie_quadrante.csv',
 'caracterizacao_socioeconomica.csv',
 'caracterizacao_socioeconomica_beneficio_politica_publica.csv',
 'caracterizacao_socioeconomica_destino_producao.csv',
 'caracterizacao_socioeconomica_entidade.csv',
 'caracterizacao_socioeconomica_especie_ambiente.csv',
 'caracterizacao_socioeconomica_forma_comercializacao.csv',
 'caracterizacao_socioeconomica_modalidade_pesca.csv',
 'caracterizacao_socioeconomica_pescador.csv',
 'dados_1610651413.zip',
 'membro_familia_atividade_pesca.csv']

importing libraries

import pandas as pd
import numpy as np

loading data frame

socio = pd.read_csv( 'caracterizacao_socioeconomica.csv', sep=',', header=0)
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-77-9b2e3fbc6ca4> in <module>
      1 #carregando primeiro dataframe
----> 2 socio = pd.read_csv( 'caracterizacao_socioeconomica.csv', sep=',', header=0, encoding='UTF-8')

~\anaconda3\lib\site-packages\pandas\io\parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
    686     )
    687 
--> 688     return _read(filepath_or_buffer, kwds)
    689 
    690 

~\anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    452 
    453     # Create the parser.
--> 454     parser = TextFileReader(fp_or_buf, **kwds)
    455 
    456     if chunksize or iterator:

~\anaconda3\lib\site-packages\pandas\io\parsers.py in __init__(self, f, engine, **kwds)
    946             self.options["has_index_names"] = kwds["has_index_names"]
    947 
--> 948         self._make_engine(self.engine)
    949 
    950     def close(self):

~\anaconda3\lib\site-packages\pandas\io\parsers.py in _make_engine(self, engine)
   1178     def _make_engine(self, engine="c"):
   1179         if engine == "c":
-> 1180             self._engine = CParserWrapper(self.f, **self.options)
   1181         else:
   1182             if engine == "python":

~\anaconda3\lib\site-packages\pandas\io\parsers.py in __init__(self, src, **kwds)
   2008         kwds["usecols"] = self.usecols
   2009 
-> 2010         self._reader = parsers.TextReader(src, **kwds)
   2011         self.unnamed_cols = self._reader.unnamed_cols
   2012 

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._get_header()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas\_libs\parsers.pyx in pandas._libs.parsers.raise_parser_error()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 1: invalid continuation byte
  • 2

    It looks like you are trying to read a file like UTF-8 that was not encoded in UTF-8.

1 answer

1


Probably the file being loaded is not encoded with UTF-8 (which is the default when you don’t specify any).

Try specifying a different charset in your call, something like:

socio = pd.read_csv( 'caracterizacao_socioeconomica.csv', sep=',', header=0, encoding = "ISO-8859-1")

or

socio = pd.read_csv( 'caracterizacao_socioeconomica.csv', sep=',', header=0, encoding='utf8')

or other Charsets, such as encoding='latin1', encoding='iso-8859-1', encoding='cp1252'...

  • Thanks Alexandre, it worked with encoding iso-8859-1, the strange thing is that all other files of the same base opened, that same file even opened in R with UTF-8, and all the other 10 files that were in the same directory, opened disfigured with this encoding... but solved the problem

Browser other questions tagged

You are not signed in. Login or sign up in order to post.