2
I have two columns (A and B) I want to calculate the average of column A for the corresponding elements only for those that in column B are greater than 10 for example.
2
I have two columns (A and B) I want to calculate the average of column A for the corresponding elements only for those that in column B are greater than 10 for example.
6
It is a problem of selecting lines from a data frame by a logical condition:
set.seed(6480) # Para ter resultados reprodutíveis
n <- 50
dados <- data.frame(A = runif(n, 0, 100), B = runif(n, 0, 40))
mean(dados[dados$B > 10, "A"]) # índice lógico
#[1] 51.62713
mean(dados$A[dados$B > 10]) # equivalente
#[1] 51.62713
But if the column B
have values NA
the logical index does not work, we have to use which
.
is.na(dados$B) <- sample(n, 10) # fazer alguns B iguais a NA
mean(dados$A[dados$B > 10]) # veja o que dados$B > 10 dá
#[1] NA
mean(dados$A[which(dados$B > 10)])
#[1] 52.17357
EDITION.
As Flávio Silva says in comment, you can also use the argument na.rm
.
mean(dados$A[dados$B > 10], na.rm = TRUE)
#[1] 52.17357
0
When it’s data.frame, and I have to take averages from multiple columns, I use the code colMeans
and within the function type: colMeans(dados[dados$coluna1=="A" & dados$coluna2=="0.01",])
normally it will get the averages by columns of the data that meet this criterion only except that in the part ,] after this comma you are not discriminating which columns enter in the calculation so it will happen with all, the problem is when the columns have characters ("A","NAME") because the code usually does not perform the calculation of averages with factors, For this you can simply concatenate the respective column numbers you want the results to. Ex: nova.tabela.contendo.as.médias<-colMeans(dados[dados$coluna1=="A" & dados$coluna2=="0.01",c(4,1,3,2,5,7)])
,
Browser other questions tagged r static
You are not signed in. Login or sign up in order to post.
thank you very much!
– Afonso O. Lenzi
In function
mean(dados$A[dados$B > 10])
, add thena.rm = T
would not be enough, stayingmean(dados$A[dados$B > 10], na.rm = T)
?– Flavio Silva
@Flaviosilva Certo, thanks for the suggestion, also gives. But the problem I was referring to is different. When a logical element has values
NA
one should usewhich
. I will edit the answer with your suggestion.– Rui Barradas
@Ruibarradas I asked more for the fact of trying to understand the difference between the
na.rm
and when to use thewhich
– Flavio Silva