0
Hello,
I’m working with a large data frame - 1000 variables and 60,000 lines - and I need to calculate the percentage of NA and whitespace for each of the variables separately.
What’s the best way to do it in R?
0
Hello,
I’m working with a large data frame - 1000 variables and 60,000 lines - and I need to calculate the percentage of NA and whitespace for each of the variables separately.
What’s the best way to do it in R?
3
To count NA
by columns you can use the function colSums()
:
# total de linhas
n = nrow(df)
# porcentagem de NA por coluna
round(colSums(is.na(df))*100/n, 2)
Or you can also use the function apply()
:
# função para contar NA's
sum_NA <- function(dados){
sum(is.na(dados))
}
# total de linhas
n = nrow(df)
# aplicando a função em cada coluna
round(apply(df, 2, sum_NA)*100/n, 2)
0
Well come on, one of the ways to do it is to create a loop and catch column by column of your data frame.
I created a data frame to illustrate
df <- data.frame(A=c(NA,2,'',1),B=c('',4,4,2),C=c(5,'','',''),D=c(7,7,5,4),E=c('','',NA,NA),F=c(9,9,0,6))
Note that some of them have blank values and NA...
for (i in 1:ncol(df)){
print(sum(is.na(df[,c(i)] ) | df[,c(i)] == "" )/length(df[,c(i)]) * 100)
}
This is a loop that walks in each column and calculates the percentage you need, based on my data frame for
will print the following results:
[1] 50
[1] 25
[1] 75
[1] 0
[1] 100
[1] 0
want something simpler and maybe faster ? try:
print(colMeans(is.na(df) | df == "")*100)
That gives the following exit:
A B C D E F
50 25 75 0 100 0
Look at that is.na
is a function of R
who meets all the NA's
made a ou(|)
to find all voids ==""
, I think this last option is faster because it only uses functions compiled in a native way from R
Daniel and Eder, thank you so much! Valuable help for those who are starting in R like me!
Browser other questions tagged r
You are not signed in. Login or sign up in order to post.
Proportion of AN per column:
colMeans(is.na(df))
. (For percentage should multiply by 100.)– Rui Barradas
Exactly, I forgot that detail.
– Thiago Fernandes
Thank you very much, Fernandes and Rui Barradas. I’m still crawling with the R and your help was very valuable!
– BJones