0
Hello,
I’m working with a large data frame - 1000 variables and 60,000 lines - and I need to calculate the percentage of NA and whitespace for each of the variables separately.
What’s the best way to do it in R?
0
Hello,
I’m working with a large data frame - 1000 variables and 60,000 lines - and I need to calculate the percentage of NA and whitespace for each of the variables separately.
What’s the best way to do it in R?
3
To count NA by columns you can use the function colSums():
# total de linhas
n = nrow(df)
# porcentagem de NA por coluna
round(colSums(is.na(df))*100/n, 2)
Or you can also use the function apply():
# função para contar NA's
sum_NA <- function(dados){
  sum(is.na(dados))
}
# total de linhas
n = nrow(df)
# aplicando a função em cada coluna
round(apply(df, 2, sum_NA)*100/n, 2)
							0
Well come on, one of the ways to do it is to create a loop and catch column by column of your data frame.
I created a data frame to illustrate
df <- data.frame(A=c(NA,2,'',1),B=c('',4,4,2),C=c(5,'','',''),D=c(7,7,5,4),E=c('','',NA,NA),F=c(9,9,0,6))
Note that some of them have blank values and NA...
for (i in 1:ncol(df)){
    print(sum(is.na(df[,c(i)]   )   | df[,c(i)] == ""  )/length(df[,c(i)]) * 100)
}
This is a loop that walks in each column and calculates the percentage you need,  based on my data frame for will print the following results:
[1] 50
[1] 25
[1] 75
[1] 0
[1] 100
[1] 0
want something simpler and maybe faster ? try:
print(colMeans(is.na(df) | df == "")*100)
That gives the following exit:
  A   B   C   D   E   F 
 50  25  75   0 100   0 
Look at that is.na is a function of R who meets all the NA's made a ou(|) to find all voids  =="", I think this last option is faster because it only uses functions compiled in a native way from R
Daniel and Eder, thank you so much! Valuable help for those who are starting in R like me!
Browser other questions tagged r
You are not signed in. Login or sign up in order to post.
Proportion of AN per column:
colMeans(is.na(df)). (For percentage should multiply by 100.)– Rui Barradas
Exactly, I forgot that detail.
– Thiago Fernandes
Thank you very much, Fernandes and Rui Barradas. I’m still crawling with the R and your help was very valuable!
– BJones