In pandas and unidecode, how to avoid warning messages - copy of a Slice from a Dataframe?

Asked

Viewed 869 times

0

In Python3 and pandas I am reading CSV files to create dataframes. In some columns I need to remove the accent (English). I do it with unity

But in some files appears a warning message

import pandas as pd
import unidecode

def f(str):
    return (unidecode.unidecode(str))

candidatos_2014 = pd.read_csv("candidatos_2014.csv",sep=',',encoding = 'utf-8', converters={'cpf': lambda x: str(x), 'sequencial': lambda x: str(x)})

candidatos_2014.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26245 entries, 0 to 26244
Data columns (total 9 columns):
Unnamed: 0         26245 non-null int64
uf                 26245 non-null object
cargo              26245 non-null object
nome_completo      26245 non-null object
sequencial         26245 non-null object
cpf                26245 non-null object
nome_urna          26245 non-null object
partido_eleicao    26245 non-null object
situacao           26245 non-null object
dtypes: int64(1), object(8)
memory usage: 1.8+ MB

eleitos = candidatos_2014[(candidatos_2014['situacao'] == 'ELEITO POR QP') | (candidatos_2014['situacao'] == 'ELEITO POR MÉDIA') | (candidatos_2014['situacao'] == 'ELEITO')]

eleitos_d_2014 = eleitos[(eleitos['cargo'] == 'DEPUTADO FEDERAL')]

eleitos_d_2014["nome_completo"] = eleitos_d_2014["nome_completo"].apply(f)

/home/reinaldo/Documentos/Code/seguranca/lib/python3.6/site-packages/ipykernel_launcher.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """

eleitos_d_2014.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 513 entries, 144 to 26209
Data columns (total 9 columns):
Unnamed: 0         513 non-null int64
uf                 513 non-null object
cargo              513 non-null object
nome_completo      513 non-null object
sequencial         513 non-null object
cpf                513 non-null object
nome_urna          513 non-null object
partido_eleicao    513 non-null object
situacao           513 non-null object
dtypes: int64(1), object(8)
memory usage: 40.1+ KB

The accent has been removed, it seems. But is there any risk of failure in some lines? Please, how to avoid this warning message? How to use . Loc?

1 answer

1

I solved the same problem (by the way, with the same database) without using units.

from bs4 import BeautifulSoup
import requests
import pandas as pd

candidatosal2014 = pd.read_csv("candidatos_alagoas_2014.csv", encoding="latin1", delimiter=";", header=None, usecols=[9, 10, 14, 43, 44])

candidatosal2014[10] = candidatosal2014[10].str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8')



display(candidatosal2014.loc[candidatosal2014[43].isin([1,2,3])]) #1 é eleito, 2 é eleito por quociente parlamentar e 3 é eleito por média
  • Hello, Thank you! but I get the impression that str.Ncode('ascii', errors='ignore') ignores the errors. If you have any character you can’t convert ignore. i wanted an (almost) unenforceable way to convert :) but I don’t think there is, it’s worth checking everything later (https://stackoverflow.com/questions/21472809/python-what-does-encodeascii-ignore-do)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.