Identify matching in dataframes

Asked

Viewed 67 times

1

Dataframe D1.

Ano/Mês Referência  Ano/Mês Competência UF  Código Município SIAFI  Nome Município SIAFI    NIS Beneficiário    Nome Beneficiário   Valor Benefício
0   201301  201202  AL  2785    MACEIO  16035155015 ADRIANA MACEDO BALBINO  102,00
1   201301  201202  AL  2785    MACEIO  16411759287 LENILDA NAZARENA DE OLIVEIRA    70,00

Dataframe D2.

Ano/Mês Referência  Ano/Mês Competência UF  Código Município SIAFI  Nome Município SIAFI    NIS Beneficiário    Nome Beneficiário   Valor Benefício
0   202001  201202  AL  2785    MACEIO  16035155015 ADRIANA MACEDO BALBINO  200,00
1   202001  201202  AL  2785    MACEIO  12347759287 MARCELO PEREIRA DE OLIVEIRA 340,00

I need to identify the repeat values in the two dataframes in the 'NIS Payee' column'.

Create another column? How to match?

1 answer

1

From what I understand you want to find intercessions between the values of two columns of two dataframes distinguished.

To find intercessions you can convert the columns you want to find intercessions in ensembles.

In Python sets are unordered collections and no duplicated elements and are represented by the class set.

Intercessions can be found with the method intersection()

To facilitate in the example I calculated the intercessions in the columns age two dataframes that found on the internet:

import pandas as pd

raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
        'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'], 
        'age': [42, 52, 36, 24, 73], 
        'preTestScore': [4, 24, 31, 2, 3],
        'postTestScore': [25, 94, 57, 62, 70]}
df1 = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'preTestScore', 'postTestScore'])


raw_data_2 = {'first_name': ['Sarah', 'Gueniva', 'Know', 'Sara', 'Cat'], 
        'last_name': ['Mornig', 'Jaker', 'Alom', 'Ormon', 'Koozer'], 
        'age': [53, 26, 72, 73, 24], 
        'preTestScore': [13, 52, 72, 26, 26],
        'postTestScore': [82, 52, 56, 234, 254]}
df2 = pd.DataFrame(raw_data_2, columns = ['first_name', 'last_name', 'age', 'preTestScore', 'postTestScore'])

intercessão = set(df1['age']).intersection(set(df2['age']))

print(intercessão)

# {24, 73}

Example in Repl.it: https://repl.it/repls/MutedJuicyTraining

In your case supposing that d1 and d2 be their dataframes and that NIS Beneficiário be the name of the columns in question, would look like this:

intercessão = set(d1['NIS Beneficiário']).intersection(set(d2['NIS Beneficiário']))
  • Thank you. But I need to see which lines there is the intersection. I will need to consult name, county and state. Ideally the results would appear in a third dataframe.

  • @vivape that identifying the lines was not discriminated in the question.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.