Subtract rows from a group in a data frame by another group

Asked

Viewed 387 times

3

Assuming the following example:

set.seed(1234)
df=data.frame(grupo=rep(c("A1","A1.2","C","D"), 3), 
          ano=c(rep(2007,4),rep(2008,4),rep(2009,4)),
          valor1=sample.int(100,12), valor2=sample(20,12),
          stringsAsFactors = FALSE)

       grupo  ano valor1 valor2
1     A1 2007     12      6
2   A1.2 2007     62     18
3      C 2007     60     20
4      D 2007     61     15
5     A1 2008     83      5
6   A1.2 2008     97     16
7      C 2008      1      3
8      D 2008     22      4
9     A1 2009     99     13
10  A1.2 2009     47     12
11     C 2009     63      2
12     D 2009     49      1

I need to subtract A1 from A1.2 for "valor1" and "valor2" (the original data frame has many more columns).

The solution that I was able to reach is very laborious and in the end joins the lines of the two data frames created:

df2= df %>% 
  filter(grupo=="A1" | grupo=="A1.2") %>%
  group_by(ano) %>%
  arrange(ano,grupo) %>%
  mutate(d1 = valor1 - lead(valor1, n=1)) %>%
  mutate(d2 = valor2 - lead(valor2, n=1)) %>%
  filter(grupo=="A1") %>%
  mutate(grupo="Am") %>%
  select(-3,-4) %>%
  rename(valor1=d1,valor2=d2)

# A tibble: 3 x 4
# Groups:   ano [3]
  grupo   ano valor1 valor2
  <chr> <dbl>  <int>  <int>
1 Am     2007    -59    -15
2 Am     2008     75     -1
3 Am     2009    -18     -2

df= df %>% 
  filter(grupo!="A1" & grupo!="A1.2")

df=bind_rows(df,df2)

  grupo  ano valor1 valor2
1     C 2007     60     20
2     D 2007     61     15
3     C 2008      1      3
4     D 2008     22      4
5     C 2009     63      2
6     D 2009     49      1
7    Am 2007    -59    -15
8    Am 2008     75     -1
9    Am 2009    -18     -2

My questions are this: 1. Is there a simpler solution? 2. I can’t imagine how to do "mutate_all" for all columns at once, except "group" and "year".

  • 1

    You can start the script with set.seed() for the results to be reproducible?

1 answer

4


I believe the following code is simpler and is equivalent.

library(dplyr)

df2 <- df %>%
  filter(grupo %in% c("A1", "A1.2")) %>% # grepl("A1", grupo)
  mutate(grupo = "Am") %>%
  group_by(grupo, ano) %>%
  summarise(valor1 = last(valor1) - first(valor1),
            valor2 = last(valor2) - first(valor2)) 
df <- df %>%
  filter(!grupo %in% c("A1", "A1.2")) %>%
  bind_rows(df2)

df
#  grupo  ano valor1 valor2
#1     C 2007     60     20
#2     D 2007     61     15
#3     C 2008      1      3
#4     D 2008     22      4
#5     C 2009     63      2
#6     D 2009     49      1
#7    Am 2007     50     12
#8    Am 2008     14     11
#9    Am 2009    -52     -1

Dice.

I’ll rework the data with set.seed and with the argument stringsAsFactors = FALSE.

set.seed(1234)

df=data.frame(grupo=rep(c("A1","A1.2","C","D"), 3), 
              ano=c(rep(2007,4),rep(2008,4),rep(2009,4)),
              valor1=sample.int(100,12), valor2=sample(20,12),
              stringsAsFactors = FALSE)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.