Counting Nan and null values in a pandas dataframe

Asked

Viewed 6,913 times

1

Imagine we have a CSV file called.csv data:

col1    col2    col3   col4
1        2        3        4
5       6         7        8
9      10        11       12
13     14        15    
33    44

import numpy as np
import pandas as pd

po = pd.read_csv('/dados.csv')

My goal is to better understand how to identify Nan/null data in a dataset.

Questions:

1.How to count how many 'Nan' data are in the above dataset?

2.How to count how many null data there are in the above dataset?

3.How to count how many data NOT 'Nan' exist in the above dataset?

4.How to count how many non-null data there are in the above dataset?

And the same questions above but per column?

I tried, for example,:

po[po['col4'].isna()].count()

thinking of accounts how many 'Nan' exist in column 4, but the answer was:

col1    2
col2    2
col3    1
col4    0
dtype: int64

What is wrong? How to answer the above questions?

  • @Noobsaibot : I don’t agree. I asked 4 questions regarding the above dataset (csv) in which each answer would be a line of code!

  • @Noobsaibot: My big question is How to count how many null/Nan data there are? And how many are not "Nan/null" to apply this in a larger dataset...

  • @Noobsaibot: I tried to apply here but did not understand the exit!

2 answers

3


What is wrong?

The function count() does non-zero data counting (for each column or row), the correct use of it is:

  • Non-zero data count for all columns

    print(po.count())
    

    the exit will be:

    col1    5
    col2    5
    col3    4
    col4    3
    dtype: int64

  • Non-zero data count for a specific column

    print(po.col4.count())
    

    the exit will be:

    3

See working on repl.it

To count missing data, you can use the function isna() or the function isnull()

  • Missing data count from all columns

    # isna
    print(po.isna().sum())
    
    # isnull
    print(po.isnull().sum())
    

    the output of both will be:

    col1    0
    col2    0
    col3    1
    col4    2
    dtype: int64

  • Count of missing data from a specific column

    # isna
    print(po.col4.isna().sum())
    
    # isnull
    print(po.col4.isnull().sum())
    

    the output of both will be:

    2

See working on repl.it

References:

0

1 and 2: how much Nan data is in the above dataset -> po.Isna(). sum(). sum()

  • Use the edit link in your question to add other information. The Post Answer button should only be used to complete answers to the question. - Of Revision

  • @Perozzo: but that’s a yes answer!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.