Compare Dataframes and show different information between them

Asked

Viewed 351 times

1

I have 2 dataframes, the df_a and df_b.

How do I compare them and show the information that is not contained in the dataframe df_a?

I tried to perform the Uplicates drop method, however, the output presents the distinct data between the two dataframes.

Below the example:

import pandas as pd

a = [1,2,3]
b = [1,2,5]

df_a = pd.DataFrame(a)
df_b = pd.DataFrame(b)

df_c = df_a.append(df_b)

df_d = df_c.drop_duplicates(keep=False)
df_d

That’s the way out:

0
2   3
2   5

My need is to show the output of the line containing the value 5 which is the value of df_b other than df_a.

2 answers

1

If you need to select different values between columns at the same index position, it can be done using (1) .loc and (2) taking the values 'not equal' with the command .ne

df_b.loc[df_b[0].ne(df_a[0])]
#saida
    0
2   5

If you want to select the elements in df_b who are not in any position in df_a, can use command .isin and deny selection in this way.

df_b.loc[~df_b[0].isin(df_a[0])]
#saida
    0
2   5

0

In this case, if you duplicate the first df as code below:

df_c = df_a.append(df_a)

Dataframe 'df_c' will be duplicated df_a. If you use drop_duplicates, it will return only the dataframe header.

To get only df_b’s unique information, then simply append df_b to df_c

df_d = df_c.append(df_b)

So when you drop_duplicates in df_d, everything that is similar between df_a and df_b and all df_a will be removed, only the unique information df_b.

df_e = df_d.drop_duplicates(keep=False)

The result of df_e will be:

    0
2   5

Then you’ll have the information you want.

  • 1

    Worked perfectly

Browser other questions tagged

You are not signed in. Login or sign up in order to post.