Importing data using pandas in python

Asked

Viewed 2,763 times

2

Good afternoon Personal!

I am trying to import a csv file using the pandas package in Python

import pandas as pd
names_col = ['AnoInfracao',
'TrimestreInfracao',
'CodigoInfracao',
'DescricaoAbreviadaInfracao',
'Gravidade',
'DescricaoTipoVeiculo',
'DescricaoEspecie',
'UF',
'Municipio',
'BR',
'KM',
'NacionalidadeVeiculo']

data = pd.read_csv("C:\\Pasta\\pasta1\\Documents\\PRF_DADOS_ABERTOS_INFRACOES_2015_T4\\PRF_DADOS_ABERTOS_INFRACOES_2015_T4.csv", delimiter=';',header=None, names=names_col,skiprows=1,dtype={'AnoInfracao':'category'})

The command runs successfully but when viewing the data the column names are correct, but in the data rows only Nan is shown.

 AnoInfracao  TrimestreInfracao  CodigoInfracao  DescricaoAbreviadaInfracao
0         NaN                NaN             NaN                         NaN   
1         NaN                NaN             NaN                         NaN   
2         NaN                NaN             NaN                         NaN   
3         NaN                NaN             NaN                         NaN   
4         NaN                NaN             NaN                         NaN 

Does the pandas package only import numerical values? This file has columns of quantitative and qualitative data.

Does anyone have any idea what it might be?

To access the data use this link http://www1.prf.gov.br/arquivos/index.php/s/sRa6yPSftGN7BMP/download Infringement Data Registered by PRF

Thank you very much!

Leo

  • If this data is numerical and the pandas package "only matters numerical values" maybe it is better to go another way

  • If you look good is a question and not statement friend. :)

  • I’m sorry, I didn’t notice, distraction

  • No problem, it happens. :)

1 answer

1

While trying to execute your code, I first received an error information regarding dtype={'AnoInfracao':'category'}, then I removed it to be able to perform. In the end, gave this here:

File "pandas parser.pyx", line 805, in pandas.parser.Textreader.read (pandas parser. c:8748)

File "pandas parser.pyx", line 827, in pandas.parser.Textreader. _read_low_memory (pandas parser. c:9003)

File "pandas parser.pyx", line 881, in pandas.parser.Textreader. _read_rows (pandas parser. c:9731)

File "pandas parser.pyx", line 868, in pandas.parser.Textreader. _tokenize_rows (pandas parser. c:9602)

File "pandas parser.pyx", line 1865, in pandas.parser.raise_parser_error (pandas parser.c:23325) pandas.io.common.Cparsererror: Error tokenizing data. C error: EOF Inside string Starting at line 35

I opened the file .csv in the excel and I realized that it is badly formatted. It already has row with column names, has blank line, and the data only starts at line 4 - if I’m not mistaken.

If you are troubleshooting errors step by step, you may arrive at a solution. But, by answering your questions:

  • Pandas does not only import numerical values.
  • I think the problem is the badly formatted '.csv'.
  • Thank you very much! I will check. :)

  • Excellent response. + 1

Browser other questions tagged

You are not signed in. Login or sign up in order to post.