Removing Row from Nan values of a Dataframe

Asked

Viewed 8,561 times

0

I joined two tables via command pd.concat and I came across the problem of there being several Nan valoes.

It turns out that there are values that are missing in one of the dataframes. To facilitate my study of Data Science I want to remove all the Row with Nan values.

I accept other suggestions.

Dice:

    Ano Country Name  Pobreza  Population
1     1960  Afghanistan      NaN   8996351.0
265   1961  Afghanistan      NaN   9166764.0
529   1962  Afghanistan      NaN   9345868.0
793   1963  Afghanistan      NaN   9533954.0
1057  1964  Afghanistan      NaN   9731361.0
1321  1965  Afghanistan      NaN   9938414.0
1585  1966  Afghanistan      NaN  10152331.0
1849  1967  Afghanistan      NaN  10372630.0
2113  1968  Afghanistan      NaN  10604346.0
2377  1969  Afghanistan      NaN  10854428.0

1 answer

2


Dropna

import pandas as pd
import numpy as np

df = pd.DataFrame([[np.nan, 2, np.nan, 0], [3, 4, np.nan, 1],
                       [np.nan, np.nan, np.nan, 5]],
                       columns=list('ABCD'))

df
   A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5

Drop columns into which all the elements are nan:

df.dropna(axis=1, how='all')    
    A    B  D
0  NaN  2.0  0
1  3.0  4.0  1
2  NaN  NaN  5

Drop columns in which any elements are nan:

df.dropna(axis=1, how='any')
   D
0  0
1  1
2  5

Drop lines on which all the elements are nan (in which case, we have no):

df.dropna(axis=0, how='all')
     A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5

Keep only lines with at least 2 values that are not nan:

df.dropna(thresh=2)
     A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1

Browser other questions tagged

You are not signed in. Login or sign up in order to post.