In R, How to calculate the average of a column based on criterion in another column?

Asked

Viewed 1,901 times

2

I have two columns (A and B) I want to calculate the average of column A for the corresponding elements only for those that in column B are greater than 10 for example.

2 answers

6

It is a problem of selecting lines from a data frame by a logical condition:

set.seed(6480)    # Para ter resultados reprodutíveis

n <- 50
dados <- data.frame(A = runif(n, 0, 100), B = runif(n, 0, 40))

mean(dados[dados$B > 10, "A"])    # índice lógico
#[1] 51.62713

mean(dados$A[dados$B > 10])       # equivalente
#[1] 51.62713

But if the column B have values NA the logical index does not work, we have to use which.

is.na(dados$B) <- sample(n, 10)        # fazer alguns B iguais a NA

mean(dados$A[dados$B > 10])            # veja o que dados$B > 10 dá
#[1] NA

mean(dados$A[which(dados$B > 10)])
#[1] 52.17357

EDITION.

As Flávio Silva says in comment, you can also use the argument na.rm.

mean(dados$A[dados$B > 10], na.rm = TRUE)
#[1] 52.17357
  • thank you very much!

  • In function mean(dados$A[dados$B > 10]), add the na.rm = T would not be enough, staying mean(dados$A[dados$B > 10], na.rm = T)?

  • @Flaviosilva Certo, thanks for the suggestion, also gives. But the problem I was referring to is different. When a logical element has values NA one should use which. I will edit the answer with your suggestion.

  • @Ruibarradas I asked more for the fact of trying to understand the difference between the na.rm and when to use the which

0


When it’s data.frame, and I have to take averages from multiple columns, I use the code colMeans and within the function type: colMeans(dados[dados$coluna1=="A" & dados$coluna2=="0.01",]) normally it will get the averages by columns of the data that meet this criterion only except that in the part ,] after this comma you are not discriminating which columns enter in the calculation so it will happen with all, the problem is when the columns have characters ("A","NAME") because the code usually does not perform the calculation of averages with factors, For this you can simply concatenate the respective column numbers you want the results to. Ex: nova.tabela.contendo.as.médias<-colMeans(dados[dados$coluna1=="A" & dados$coluna2=="0.01",c(4,1,3,2,5,7)]),

Browser other questions tagged

You are not signed in. Login or sign up in order to post.