r - sum of a variable relative to the values of another variable in a data frame

Asked

Viewed 1,338 times

3

I have a multi-column dataframe. How do I add the values of a column within an element of another variable? I want to do this to summarize the data of each species within each campaign. I tried to use the summary function of the plyr package but it didn’t work. It may be because I incorrectly put the factors in the function.

campanha   especie   frequencia
   1          A         2
   1          A         1
   1          A         3
   1          A         5
   1          B         1
   1          B         2
   1          B         1
   1          B         6
   1          B         1
   1          C         3
   1          C         1
   1          C         8
   1          C         4
   2          A         2
   2          A         8
   2          A         4
   2          A         5
   2          B         4
   2          B         2
   2          B         6
   2          B         1
   2          C         3
   2          C         1
   2          C         9

2 answers

4


Another way to do this is with the package dplyr:

library(dplyr)
dados %>%
  group_by(campanha, especie) %>%
  summarise(sum(frequencia))
# A tibble: 6 x 3
# Groups:   campanha [?]
  campanha especie `sum(frequencia)`
     <int> <fct>               <int>
1        1 A                      11
2        1 B                      11
3        1 C                      16
4        2 A                      19
5        2 B                      13
6        2 C                      13

Note that I have grouped the data with the function group_by and indicating the grouping variables. Next, I used summarise to inform you that you would like to add the variable frequencia within the groups created.

  • Thanks @Marcus Nunes! It worked. But do you think there’s a difference when I use the dplyr package or plyr? Because when I used the plyr package, it summarized all the data in one value. But when I cleaned and used the dplyr package it worked.

  • 1

    I don’t know how to use the plyr, then unfortunately I can not opine. What I know is the dplyr is newer and if both packages are loaded simultaneously, errors may occur in the R session.

  • @Marcusnunes From this answer, if I wanted to sum all the lines, that is, add the answers from A to B with C? 11+11+16+19....

2

This can be solved with the function aggregate.

res <- aggregate(frequencia ~ campanha + especie, dados, sum)
res
#  campanha especie frequencia
#1        1       A         11
#2        2       A         19
#3        1       B         11
#4        2       B         13
#5        1       C         16
#6        2       C         13

Data in dput format().

dados <-
structure(list(campanha = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
    especie = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
    3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L
    ), .Label = c("A", "B", "C"), class = "factor"), frequencia = c(2L, 
    1L, 3L, 5L, 1L, 2L, 1L, 6L, 1L, 3L, 1L, 8L, 4L, 2L, 8L, 4L, 
    5L, 4L, 2L, 6L, 1L, 3L, 1L, 9L)), .Names = c("campanha", 
"especie", "frequencia"), class = "data.frame", row.names = c(NA, 
-24L))

Browser other questions tagged

You are not signed in. Login or sign up in order to post.