r - average of one variable relative to the values of another variable in a data frame within each grouping

Asked

Viewed 236 times

1

This question is after one posed by me recently. This is the link if you want to follow r - average of one variable relative to the values of another variable in a data frame

I have a dataframe with several columns. How do I calculate the average of one variable based on the values of another variable within a grouping of one of the columns? That is, I have the frequency of several species found in 4 campaigns divided into 2 stages and I want to calculate the average of each species registered in each location within each stage, being the average performed with the campaigns of that stage. That is, the average frequency of the species is based on all campaigns carried out within that stage and not based only on the campaigns in which the species is registered or based on ALL campaigns, no matter the stage. the script I’m using based on your help is this

#somar todos os registros de cada sp no local em cada campanha.

dados_anura = dados_sapo %>%
  group_by(etapa, campanha,  local,  especie) %>%
  summarise(sum(frequencia))
## Vou lá na tabela e troco o nome da coluna "sum(frequencia)" por frequencia
write.table(dados_anura, 'dados_anura.csv', sep = ';', row.names = F)


# Salvo e chamo aqui de novo

dados_anuras <- read.csv("dados_anura.csv", header = TRUE, sep=";")

#média com base em todas as campanhas mesmo que não haja registro da espécie.
# calcular as médias das campanhas agrupadas por especie e local, com todas as campanhas e não só aquelas em que há registro da espécie.
# Definir uma função mediaCamp que faça esses cálculos.Depois, usa-se mais uma vez o aggregate.

mediaCamp <- function(x){
  ncamp <- length(unique(dados_anuras$campanha))
  sum(x)/ncamp
}

dadomean4 <- aggregate(frequencia ~ etapa, local + especie, dados_anuras, mediaCamp)
### Para retirar os NA's
dadomean4[is.na(dadomean4)] <- 0

But the result is going wrong. Thus, the average calculation is based on ALL campaigns, and not based on the campaigns only of that stage, even giving the value (in the cell) for that stage.

etapa  campanha	local	especie	frequencia
A1        1	      A	    aa	      1
A1        1	      A	    bb	      2
A1        1	      A	    cc	      1
A1        1	      B	    bb	      1
A1        1	      B	    dd	      7
A1        2	      A	    aa	      50
A1        2	      A	    bb	      1
A1        2	      A	    dd	      8
A2        3        A	  aa	      2
A2        3	      B	    aa	      3
A2        3	      B	    dd	      3
A2        4	      A	    aa	      33
A2        4	      A	    bb	      5
A2        4	      A	    cc	      1
A2        4	      A	    dd	      1
A2        4	      B	    aa	      18
A2        4	      B	    bb	      10
A2        4	      B	    dd	      6

  • What would be the proper exit you expect?

1 answer

5


The average of each species registered in each local within each stage:

dplyr::group_by(data, especie, local, etapa) %>% summarise(Total=mean(frequencia))
# A tibble: 13 x 4
# Groups:   especie, local [?]
#   especie local etapa Total
#   <fct>   <fct> <fct> <dbl>
# 1 aa      A     A1     25.5
# 2 aa      A     A2     17.5
# 3 aa      B     A2     10.5
# 4 bb      A     A1      1.5
# 5 bb      A     A2      5  
# 6 bb      B     A1      1  
# 7 bb      B     A2     10  
# 8 cc      A     A1      1  
# 9 cc      A     A2      1  
# 10 dd     A     A1      8  
# 11 dd     A     A2      1  
# 12 dd     B     A1      7  
# 13 dd     B     A2      4.5

Browser other questions tagged

You are not signed in. Login or sign up in order to post.