Compare two Dataframe and create a new Dataframe

Asked

Viewed 489 times

0

I have a question and would like your help. I have two Dataframe and I need to compare some columns of these Dataframe are equal and, if they are equal, then I need to store the records in another Dataframe. I mean, I need to create a new Dataframe from the comparison of other two. The example refers to df1 and df2 (need to compare 4 criteria - 'x', y', z', w') and after the comparison create the dfNovo with the records that were true in the comparison. In the case of the example below, the dfNovo would be formed by the record of df1 index 0 and by the record of df2, since they are equal in the criteria mentioned.

 import pandas as pd

 df1 = pd.DataFrame({"x": ['f','m','f'],
                        "y": [11,22,39], 
                        "z": ['C','nC','nC'], 
                        "w": ['F','S','M'],
                        "var1":["no", "yes", "no"],
                        "var2":["yes", "yes", "yes"],
                        "var3":["no", "no", "no"],
                        "classe":["yes", "yes", "no"]})

df2 = pd.DataFrame({"x": ['f','f','m'],
                        "y": [11,22,40], 
                        "z": ['C','C','nC'], 
                        "w": ['F','M','M'],
                        "var1":["yes", "yes", "no"],
                        "var2":["no", "no", "yes"],
                        "var3":["no", "no", "yes"],
                        "classe":["no", "yes", "yes"]})

inserir a descrição da imagem aqui

2 answers

0

0

Based on the example given, the solution would be to use the method merge

>>> dfNovo = pd.merge(df1, df2, on=['x', 'y', 'z', 'w'])

>>> dfNovo
   x   y  z  w var1_x var2_x var3_x classe_x var1_y var2_y var3_y classe_y
0  f  11  C  F     no    yes     no      yes    yes     no     no       no

You can define the suffix you want

>>> dfNovo = pd.merge(df1, df2, on=['x', 'y', 'z', 'w'], suffixes=['_left', '_right'])

>>> dfNovo
   x   y  z  w var1_left var2_left var3_left classe_left var1_right var2_right var3_right classe_right
0  f  11  C  F        no       yes        no         yes        yes         no         no           no

>>> dfNovo = pd.merge(df1, df2, on=['x', 'y', 'z', 'w'], suffixes=['_df1', '_df2'])

>>> dfNovo
   x   y  z  w var1_df1 var2_df1 var3_df1 classe_df1 var1_df2 var2_df2 var3_df2 classe_df2
0  f  11  C  F       no      yes       no        yes      yes       no       no         no

I hope I’ve helped

Browser other questions tagged

You are not signed in. Login or sign up in order to post.