Conditional column based on multiple dplyr lines

Asked

Viewed 71 times

4

I have this df:

structure(list(id = c("R054", "R054", "R054", "R054", "R054", 
"GT68U", "GT68U", "GT68U", "GT68U", "GT68U", "G001", "G001", 
"G001", "G001"), car1 = c("sim", "sim", "sim", "sim", "sim", 
"sim", "nao", "sim", "nao", "nao", "nao", "nao", "nao", "nao"
)), row.names = c(NA, -14L), class = c("tbl_df", "tbl", "data.frame"
))

I would like to create a new column based on the values of car1 for the row set of each id. If the id always has car1 "yes" or "no" the new column only replicates the information. If the id has "yes" or "no" the new column should display "nd". So it would look like this:

id      car1    output
R054    sim     sim
R054    sim     sim
R054    sim     sim
R054    sim     sim
R054    sim     sim
GT68U   sim     nd
GT68U   nao     nd
GT68U   sim     nd
GT68U   nao     nd
GT68U   nao     nd
G001    nao     nao
G001    nao     nao
G001    nao     nao
G001    nao     nao

I tried to use the function mutate

df %>%
  group_by(id)%>%
  mutate(output = case_when(car1 == "nao" & car1 == "nao" ~ "nao",
                            car1 == "sim" & car1 == "sim" ~ "sim",
                            car1 == "sim" & car1 == "nao" ~ "nd",
                        TRUE ~ 0))

but I get the bug

Error: must be a Character vector, not a double vector Run rlang::last_error() to see Where the error occurred.

3 answers

7

I used the group_by + mutate + case_when + all to verify that all occurrences of the determined id were yes/no and those mixed would be missing values and filled with nd.

library(dplyr)

df %>% group_by(id)%>%
  mutate(output = case_when(
    all(car1 == 'sim')  ~ 'sim',
    all(car1 == 'nao') ~ 'nao',
    TRUE ~ 'nd'
  ))

Exit:

   id    car1  output
   <chr> <chr> <chr> 
 1 R054  sim   sim   
 2 R054  sim   sim   
 3 R054  sim   sim   
 4 R054  sim   sim   
 5 R054  sim   sim   
 6 GT68U sim   nd    
 7 GT68U nao   nd    
 8 GT68U sim   nd    
 9 GT68U nao   nd    
10 GT68U nao   nd    
11 G001  nao   nao   
12 G001  nao   nao   
13 G001  nao   nao   
14 G001  nao   nao 

2

@lmonferrari has already responded how to use the dplyr::case_when in your case; this is another option, using ifelse and unique:

library(dplyr)

df %>%
  group_by(id) %>%
  mutate(output = ifelse(length(unique(car1)) == 1, car1, "nd"))

If in the group there is only one value for car1 (whatever), unique shall have length 1; in that case, output is filled with the corresponding value of car1, otherwise with "nd".

For the record, here’s the equivalent of data table.:

library(data.table)

setDT(df)

df[, output := ifelse(length(unique(car1)) == 1, car1, "nd"), id]
# ou, usando data.table::fifelse, bem mais rápido:
df[, output := fifelse(length(unique(car1)) == 1, unique(car1)[1], "nd"), id]

1

Although the question asks for a solution dplyr, here is a solution R base, in a row, with the function ave.

df$output <- ave(df$car1, df$id, FUN = function(x) if(all(x == x[1])) x else "nd")

Browser other questions tagged

You are not signed in. Login or sign up in order to post.