How to know the amount of NA in each variable?

Asked

Viewed 693 times

5

Suppose I’m working with the following database:

df=data.frame(v=c(1,2,NA,4,NA,6,7,8,9,10),v2=c(11,NA,NA,14,NA,16,NA,NA,19,NA),
          v3=c(21,22,23,24,25,26,27,28,29,30),
          v4=c("a","b","c", NA, NA,NA,"g","h", NA,NA))

I need to know how much NA each variable contains. In the example: v1=2 v2=6 v3=0

I could do the command below for each variable

sum(is.na(df$v1))

But when we are with a big data frame it is nothing practical.

Another possible command is the summary(df) but as it returns many other results it becomes difficult to visualize the amounts of NA in each variable.

There is a way to return only the amount of Nas that each data frame variable has?

3 answers

4


Use sapply to apply its function to each column of the data.frame

df
    v v2 v3   v4
1   1 11 21    a
2   2 NA 22    b
3  NA NA 23    c
4   4 14 24 <NA>
5  NA NA 25 <NA>
6   6 16 26 <NA>
7   7 NA 27    g
8   8 NA 28    h
9   9 19 29 <NA>
10 10 NA 30 <NA>

sapply(df, function(x) sum(is.na(x)))
 v v2 v3 v4 
 2  6  0  5 

3

You can use the function colwise of plyr to make its function applicable to data frame columns:

Defining the function:

library(plyr)
quantos.na <- colwise(function(x) sum(is.na(x)))

Applying the function:

quantos.na(df)
  v v2 v3 v4
1 2  6  0  5

1

Try it here

table(date$VAR1, useNA = "Always")

The result appears that way:

 1      2     3     4     5     6     7  <NA> 

10484   518  4389  3639   272   522   836 18291

the command useNA = "Always" is used so that the R does not omit the missings data.

  • 1

    It is always good to explain why of your answer.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.