How to index subgroups in R

Question

How to index subgroups in R

Asked 4 years, 2 months ago

Viewed 44 times

1

What I need is to index subgroups within a group and that R brings me a specific result, line by line, depending on the subgroup.

First I need to group the data by process. This I got with group_by:

dados <- data.frame(processos = c("123","123","123","abc","abc","xyz","xyz","xyz","xyz"),
                    situacao = c("a","b","c","c","b","c","a","a","b"),
                    resultado = c("0","0","0","0","0","0","0","0","0"))

dados <- 
  dados %>% 
  group_by(processos)

I need R to check the column situacao when the b appears. Then put all rows of the process group with 1 and all rows of the group b as 2. I need him to complete this process to process.

Detail that the groups (processes) have different sizes. The expected result would look like this:

dados
         processos situacao resultado
    1       123        a         1
    2       123        b         2
    3       123        c         1
    4       abc        c         1
    5       abc        b         2
    6       xyz        c         1
    7       xyz        a         1
    8       xyz        a         1
    9       xyz        b         2

I’ve tried to aggregate, mutate, if_else, if, but all without any success.

3 answers

Browser other questions tagged r

You are not signed in. Login or sign up in order to post.

by Marcus Nunes • **17,915** points · Answer 1 · 2021-05-20T18:40:40+00:00

Just use mutate combined with if_else. If the column situacao is equal to b, result gets 2. Otherwise you get 1. You don’t even need to group the data, because the process information is not used at any time.

library(dplyr)

dados %>%
  mutate(resultado = if_else(situacao == "b", 2, 1))
#>   processos situacao resultado
#> 1       123        a         1
#> 2       123        b         2
#> 3       123        c         1
#> 4       abc        c         1
#> 5       abc        b         2
#> 6       xyz        c         1
#> 7       xyz        a         1
#> 8       xyz        a         1
#> 9       xyz        b         2

^{Created on 2021-05-20 by the reprex package (v2.0.0)}

by Jorge Mendes • **1,623** points · Answer 2 · 2021-05-20T20:18:39+00:00

Use the little trick of cumsum, which then adds 1 when a b appears in the group.

dados <- data.frame(processos = c("123","123","123","abc","abc","xyz","xyz","xyz","xyz"),
                    situacao = c("a","b","c","c","b","c","a","a","b"),
                    resultado = c("0","0","0","0","0","0","0","0","0"))
library(dplyr)

dados <- 
  dados %>% 
  group_by(processos) %>% 
  mutate(resultado = cumsum(situacao == "b") + 1)

dados
#> # A tibble: 9 x 3
#> # Groups:   processos [3]
#>   processos situacao resultado
#>   <chr>     <chr>        <dbl>
#> 1 123       a                1
#> 2 123       b                2
#> 3 123       c                2
#> 4 abc       c                1
#> 5 abc       b                2
#> 6 xyz       c                1
#> 7 xyz       a                1
#> 8 xyz       a                1
#> 9 xyz       b                2

^{Created on 2021-05-20 by the reprex package (v2.0.0)}

by Rui Barradas • **15,422** points · Answer 3 · 2021-05-20T18:59:24+00:00

If the result of the processing is equal for all groups, then it does not depend on the group.
This solution uses the fact that a logical condition corresponds to the integers 0/1. It is then sufficient to add 1 to the equality values a "b".

library(dplyr)

dados %>% mutate(resultado = (situacao == "b") + 1L)
#  processos situacao resultado
#1       123        a         1
#2       123        b         2
#3       123        c         1
#4       abc        c         1
#5       abc        b         2
#6       xyz        c         1
#7       xyz        a         1
#8       xyz        a         1
#9       xyz        b         2

Editing

A reply from the PA which should in fact be a comment on the existing responses is as follows::

I tried this way, but it only brings the result 2 on the line where the "b" appears. I really need before the "b" all lines are 1 and "b" forward all lines are 2. When you change the process the indicator should return to 1.

Thus, in the first process the result would be 1,2,2 and in line 4 the marker back to 1.

After this reply, it is clear that processos and the right code will be as follows. (Meanwhile there is already a equal answer, user’s @Jorge Mendes.)

dados %>% 
  group_by(processos) %>% 
  mutate(resultado = cumsum(situacao == "b") + 1L)
## A tibble: 9 x 3
## Groups:   processos [3]
#  processos situacao resultado
#  <chr>     <chr>        <int>
#1 123       a                1
#2 123       b                2
#3 123       c                2
#4 abc       c                1
#5 abc       b                2
#6 xyz       c                1
#7 xyz       a                1
#8 xyz       a                1
#9 xyz       b                2