How to get only the records not duplicated with pandas

Question

How to get only the records not duplicated with pandas

Asked 6 years, 1 month ago

Viewed 1,123 times

0

How would you get only the un-duplicated lines of a dataframe? Without them being single records, so df.unique() would not fit here. Only the ones that exist 1 same. I tried that way, but I don’t know if it’s right.

df2 = DF
df2.drop_duplicates('userId', keep=False, inplace=True)

So I would use the df2 where all those that are not duplicated would remain. This form is correct?

1 answer

Browser other questions tagged python pandas

You are not signed in. Login or sign up in order to post.

by GBrandt • **1,052** points · Answer 1 · 2019-06-16T12:57:06+00:00

Almost.

df2 = DF does not create a copy of DF, just give him one more name.

When you give drop_duplicates(..., inplace=True) modifications happen directly in the dataframe (i.e. your data frame loses duplicates). The way you did, duplicates would come out of DF, besides df2 (because in fact they are the same thing).

The correct would simply be:

df2 = DF.drop_duplicates('userId', keep=False)

This creates a copy of DF without data that has duplicates and puts it in df2.