Duplicated, how to pass more than one parameter?

Asked

Viewed 65 times

0

I have two CSV, I want to compare more than one field using duplicated. Is there a way, or can I just pass one parameter at a time?

I followed the direction of Clayton Tosatti and I got here, but now I’ve come across such doubt.

import pandas as pd
dados = pd.read_csv('gestantes_prenatal.csv')
dados2 = pd.read_csv('cidade_social.csv')
print(dados[['CNS','CNS','CPF','PIS','NASCIMENTO','NOME_DA_MAE']])
print(dados2[['NOME','CNS','CNS','CPF','PIS','NASCIMENTO','NOME_DA_MAE']])
df_aux = pd.concat([dados['CPF'],dados2['CPF']])

Right down to the last line, perfect. But, I wanted something like:

df_aux = pd.concat([dados['NOME','PIS','CPF'],dados2['NOME','PIS','CPF']]) 
df_aux[df_aux.duplicated()]

Generates this error:

KeyError                                  Traceback (most recent call last)
/home/hudson/.local/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)<br/>
   2896             try:<br/>
-> 2897                 return self._engine.get_loc(key)<br/>
   2898             except KeyError:<br/>

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ('NOME', 'PIS', 'CPF')

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)<br/>
<ipython-input-10-5a532f25327c> in <module>()
----> 1 df_aux = pd.concat([dados['NOME','PIS','CPF'],dados2['NOME','PIS','CPF']])

/home/hudson/.local/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2993             if self.columns.nlevels > 1:<br/>
   2994                 return self._getitem_multilevel(key)<br/>
-> 2995             indexer = self.columns.get_loc(key)<br/>
   2996             if is_integer(indexer):<br/>
   2997                 indexer = [indexer]<br/>

/home/hudson/.local/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2897                 return self._engine.get_loc(key)<br/>
   2898             except KeyError:<br/>
-> 2899                 return self._engine.get_loc(self._maybe_cast_indexer(key))<br/>
   2900         indexer = self.get_indexer([key], method=method, tolerance=tolerance)<br/>
   2901         if indexer.ndim > 1 or indexer.size > 1:<br/>

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ('NOME', 'PIS', 'CPF')

I used these files.

1 answer

1


You have to change your line of CONCAT Hudson

Try it this way:

df_aux = pd.concat([dados[['NOME','PIS','CPF']],dados2[['NOME','PIS','CPF']]]) 
df_aux.loc[df_aux.duplicated()]
  • 1

    What clasps don’t do heim?!!! Again, thank you very much Clayton Tosatti.

  • rsrs they help a lot even, I don’t know if you got the concept, but basically, they serve to pass columns as lists, also should work using a variable(list) to store the columns before and then only pass in DF. lista = ['NOME', 'PIS', 'CPF'] | df_aux = pd.concat([dados[lista],dados2[lista]])

  • Perfect, very show indeed! Beautiful language.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.