r - average of one variable relative to the values of another variable in a data frame and take NA values

Asked

Viewed 206 times

1

I have a multi-column dataframe. How do I calculate the average of one variable based on the values of another variable? I have the frequency of several species found in 4 campaigns and I want to calculate the average of each species recorded. For this I must add the frequencies observed by the number of campaigns performed at each location, but the function I used

dadomean = dcast(dados, local  ~ especie, mean)

calculates the average based only on the campaigns that the species was recorded and does not use the data in which the record was 0. as well as the function

dadomean = dados %>%
  group_by(local, especie) %>%
  summarise(mean(frequencia))

I also tried to

dadomean = dcast(dados, local  ~ especie, mean, subset = .(campanha == 4)))

but did not accept the function and gave this error

Error in . (campaign == 4) : could not find Function "."

I also tried the following and it didn’t work.

dadomean = dcast(dados, local  ~ especie, mean, na.rm=TRUE, margins = "campanha")

And also always has NA to those places where it was to be 0 and I couldn’t turn into 0.

campanha	local	especie	frequencia
1	         A	    aa	      1
1	         A	    bb	      2
1	         A	    cc	      1
1	         B	    bb	      1
1	         B	    dd	      7
2	         A	    aa	      50
2	         A	    bb	      1
2	         A	    dd	      8
3          A	    aa	      2
3	         B	    aa	      3
3	         B	    dd	      3
4	         A	    aa	      33
4	         A	    bb	      5
4	         A	    cc	      1
4	         A	    dd	      1
4	         B	    aa	      18
4	         B	    bb	      10
4	         B	    dd	      6

2 answers

1

I don’t know if that’s exactly what you want.

The average of each species in each location.

library(dplyr)
group_by(dados, especie, local)%>%summarise(Total=mean(frequencia))
  • Unfortunately it didn’t work. The values doubled

  • 1

    But didn’t you want for each species in each location? If it is only by species and indifferent location, just remove the local variable from the formula.

1


The question is quite confused. Ask by means of frequencia grouped by campanha and then only gives examples of code where the grouping is by local and especie.

I’ll group first by campanha.

aggregate(frequencia ~ campanha, dados, mean, na.rm = TRUE)
#  campanha frequencia
#1        1   2.400000
#2        2  19.666667
#3        3   2.666667
#4        4  10.571429

Now, I’ll group by local and espécie, both using the package reshape2 as with the base function tapply. As you can see the results are indicentric, the only difference is that one assigns the value NaN when the mean cannot be calculated and the other assigns NA.
Also, to put 0 is exactly the same.

library(reshape2)

dadomean1 <- dcast(dados, local  ~ especie, mean, value.var = "frequencia")
dadomean1[is.na(dadomean1)] <- 0
dadomean1
#  local   aa       bb  cc       dd
#1     A 21.5 2.666667   1 4.500000
#2     B 10.5 5.500000   0 5.333333


dadomean2 <- with(dados, tapply(frequencia, list(local, especie), mean))
dadomean2[is.na(dadomean2)] <- 0
dadomean2
#    aa       bb cc       dd
#A 21.5 2.666667  1 4.500000
#B 10.5 5.500000  0 5.333333

EDITION.

To calculate the averages of the campaigns grouped by especie and local but taking into account all campaigns and not only those in which there is record of the species, it is best to define a function mediaCamp make these calculations.
Then you use the aggregate.

mediaCamp <- function(x){
  ncamp <- length(unique(dados$campanha))
  sum(x)/ncamp
}

dadomean3 <- aggregate(frequencia ~ especie + local, dados, mediaCamp)
dadomean3
#  especie local frequencia
#1      aa     A      21.50
#2      bb     A       2.00
#3      cc     A       0.50
#4      dd     A       2.25
#5      aa     B       5.25
#6      bb     B       2.75
#7      dd     B       4.00

DICE format dput.

dados <-
structure(list(campanha = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), local = structure(c(1L, 
1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L), .Label = c("A", "B"), class = "factor"), especie = structure(c(1L, 
2L, 3L, 2L, 4L, 1L, 2L, 4L, 1L, 1L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 
4L), .Label = c("aa", "bb", "cc", "dd"), class = "factor"), frequencia = c(1L, 
2L, 1L, 1L, 7L, 50L, 1L, 8L, 2L, 3L, 3L, 33L, 5L, 1L, 1L, 18L, 
10L, 6L)), .Names = c("campanha", "local", "especie", "frequencia"
), class = "data.frame", row.names = c(NA, -18L))
  • I’m sorry if I’ve been confusing, Rui, it’s just that I’m still learning to ask the questions here, and I find it hard to put the question in writing the right way. I appreciate your help, but it is that this way (I tested it here) continues to make the same mistake. It averages based on the campaigns in which there was record and not based on all campaigns. As an example, the value of the average frequency of the species aa at location B should be 5.25 because it is 21 individuals/4 campaigns but when I use this function or the ones I put in my question, the value is 10.5 which is 21 individuals/2 campaigns (in which the species appears).

  • I do not know how to include that make the average of the frequency based on all the campaigns performed and not based on the campaigns where there is the record of the species

  • 1

    @Jussara So it became clearer. Want the average based on all campaigns even if there is no record of the species.

  • 1

    @Jussara See if this is it.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.