Python pandas: Drop rows with duplicate column and another column with null value

Asked

Viewed 778 times

0

I would like to drop in lines where the column 1 is duplicated and column 2 is null. I expect an output that way

Col1  Col2
A     123
A     NaN
B     NaN
B     456

Exit

Col1  Col2
A     123
B     456

I tried to do more or less like this: It will be possible to do this type of Line Drop

df2 = df[df.duplicated('Col1')] and df[df['Col2'].isnull()]

2 answers

2


What you did is almost right, it lacked only a small detail in the function duplicated, and conditioning is done together:

df2 = df[df.duplicated('Col1', keep=False) & df['Col2'].isnull()]

With the parameter Keep=False, all duplicates are identified.

1

According to the documentation you can use the dropna of the pandas themselves.

df = df.dropna(how='any')

or by modifying the existing one, using the inplace:

df.dropna(how='any', inplace=True)

You can also use the parameter axis to select whether to exclude row or column (0 or 1 respectively).

If you want to be more specific with which column to search use the parameter subset.

Edit: I spent some time trying to find the solution with Nan, but without success. Even using the pd.isnan() to test gives error.

If it fits you have this solution if it’s 0 instead of Nan.

df2 = df[~(df['Col1'].duplicated(False) & (df['Col2']=='0'))]
  • 1

    I can’t drop all the values null of Col2, only those that are repeated in Col1. For there may be some value in Col1 which is not duplicated and which is null in Col2, these I wish to keep

Browser other questions tagged

You are not signed in. Login or sign up in order to post.