Copy part of Dataframe where column is Null or Nan

Asked

Viewed 825 times

1

I have the following doubt.

I have the following sample dataframe:

import pandas as pd
df = pd.DataFrame({'A' : [4,5,13,18], 'B' : [10,np.nan,np.nan,40], 'C' : [np.nan,50,25,np.nan], 'D' : [-30,-50,10,16], 'E' : [-40,-50,7,12]})

df

inserir a descrição da imagem aqui

What I intend to do is:

  • From column B I want to check in which row column B is 'NAN' and if so, I want to create another dataframe containing the same columns as the current one (df), but only with index rows 1 and 2 (in this case).

To better illustrate, the result should be:

df2

inserir a descrição da imagem aqui

I initially tried using the command Loc

df2 = df.loc[:]

however, I could not reference how only seek the values np. Nan, there is some way to do this?

I tested with the pandas null fields to see the result.

    import pandas as pd
df = pd.DataFrame({'A' : [4,5,13,18], 'B' : [10,'','',40], 'C' : ['',50,25,''], 'D' : [-30,-50,10,16], 'E' : [-40,-50,7,12]})

And using the syntax:

df2 = df[pd.isnull(df).any(axis=1)]

this command works but looks for blank lines in any column, how could change it to take a single column?

  • I don’t know much about numpy, but I found a question that might help you, https://stackoverflow.com/questions/6736590/fast-check-for-nan-in-numpy

1 answer

1


You can use one of the three options:

op_a = df[df['B'].isnull()] # mesmo resultado com: isna()
# ou
op_b = df.loc[df['B'].isnull()] # mesmo resultado com: isna()
# ou
op_c = df.query('B != B')

both will have the exit:

   A    B     C   D   E
1  5  NaN  50.0 -50 -50

Dataframe used as example:

{
  'A' : [4,5,13,18],
  'B' : [10,np.nan,'',40],
  'C' : [np.nan,50,25,np.nan],
  'D' : [-30,-50,10,16],
  'E' : [-40,-50,7,12]
}

You can be seen working in repl it.

  • Excellent! It worked very well thanks. I only had a doubt, what is the difference between using or not the LOC?

Browser other questions tagged

You are not signed in. Login or sign up in order to post.