Python pandas: Drop rows with duplicate column and another column with null value

Question

Asked 6 years, 2 months ago

Viewed 778 times

0

I would like to drop in lines where the column 1 is duplicated and column 2 is null. I expect an output that way

Col1  Col2
A     123
A     NaN
B     NaN
B     456

Exit

Col1  Col2
A     123
B     456

I tried to do more or less like this: It will be possible to do this type of Line Drop

df2 = df[df.duplicated('Col1')] and df[df['Col2'].isnull()]

2 answers

2

What you did is almost right, it lacked only a small detail in the function duplicated, and conditioning is done together:

df2 = df[df.duplicated('Col1', keep=False) & df['Col2'].isnull()]

With the parameter Keep=False, all duplicates are identified.

Browser other questions tagged python-3.x pandas

You are not signed in. Login or sign up in order to post.

by mzavarez • **111** points · Answer 1 · 2019-06-07T12:50:21+00:00

According to the documentation you can use the dropna of the pandas themselves.

df = df.dropna(how='any')

or by modifying the existing one, using the inplace:

df.dropna(how='any', inplace=True)

You can also use the parameter axis to select whether to exclude row or column (0 or 1 respectively).

If you want to be more specific with which column to search use the parameter subset.

Edit: I spent some time trying to find the solution with Nan, but without success. Even using the pd.isnan() to test gives error.

If it fits you have this solution if it’s 0 instead of Nan.

df2 = df[~(df['Col1'].duplicated(False) & (df['Col2']=='0'))]