Repeating the subtraction of groups in a data frame for all numerical variables

Asked

Viewed 85 times

5

I have the following code:

 df <- data.frame(grp = rep(letters[1:3], each = 2), 
                     index = rep(1:2, times = 3), 
                     value = seq(10, 60, length.out = 6),
                     value2 = seq(20, 70, length.out = 6),
                     value3 = seq(30, 80, length.out = 6))

library(tidyverse)
tbl_df(df) #para melhor visualização

# grp   index value value2 value3
# <fct> <int> <dbl>  <dbl>  <dbl>
# 1 a      1    10     20     30
# 2 a      2    20     30     40
# 3 b      1    30     40     50
# 4 b      2    40     50     60
# 5 c      1    50     60     70
# 6 c      2    60     70     80

# resultado esperado:
# grp   index value value2 value3
# <fct> <int> <dbl>  <dbl>  <dbl>
# 1 a      1    10     20     30
# 2 a      2    20     30     40
# 3 b      1   -20    -20    -20
# 4 b      2   -20    -20    -20
# 5 c      1    50     60     70
# 6 c      2    60     70     80

# subtrair um grupo de outro
df$value[df$grp=="b"]  = df$value[df$grp=="b"]  - df$value[df$grp=="c"]
df$value2[df$grp=="b"] = df$value2[df$grp=="b"] - df$value2[df$grp=="c"]
df$value3[df$grp=="b"] = df$value3[df$grp=="b"] - df$value3[df$grp=="c"]

How to subtract all value# group 'c' of value# group 'b', at once, without the need to repeat

df$value[df$grp=="b"]  = df$value[df$grp=="b"]  - df$value[df$grp=="c"]

for each variable?

2 answers

6


The following does what it wants with R base.

ib <- which(df$grp == "b")
ic <- which(df$grp == "c")
df[3:5] <- lapply(df[3:5], function(x){
  x[ib] <- x[ib] - x[ic]
  x
})

df
#  grp index value value2 value3
#1   a     1    10     20     30
#2   a     2    20     30     40
#3   b     1   -20    -20    -20
#4   b     2   -20    -20    -20
#5   c     1    50     60     70
#6   c     2    60     70     80

Now, tidy up the house. Variables ib and ic used to index the vectors to be transformed are no longer needed.

rm(ib, ic)

2

It is possible to do so with dplyr:

bind_rows(
  df %>% 
    filter(grp == "a"),

  df %>% 
    filter(grp != "a") %>% 
    group_by(index) %>% 
    mutate_at(vars(starts_with("value")), funs(. - lead(., order_by = grp, default = 0)))
)

  grp index value value2 value3
1   a     1    10     20     30
2   a     2    20     30     40
3   b     1   -20    -20    -20
4   b     2   -20    -20    -20
5   c     1    50     60     70
6   c     2    60     70     80

The code gets kind of weird because there is group A. If in practice you always subtract the value of the previous group by the value of the next group you could ignore the bind_rows and filters.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.