Join scattered values on different lines in the same line

Asked

Viewed 50 times

4

Hello,

I have a database in which I used tidyr::spread which resulted in information spread across several lines. I would like to join the different lines in one. Example:

df <- data.frame(obs = c(1,1,1,2),
                 msgA = c("A", NA, NA, "A" ),
                 msgB = c(NA,"B",NA, NA),
                 msgC = c(NA,NA,"C", NA))

#O que eu tentei:

library(dplyr)
df %>%
  group_by(obs) %>%
  mutate(msg = paste(msgA, msgB, msgC, sep=", ")) %>%
  select(-c(2:4))

# Resultado que eu queria:
# 
# > df
# obs     msg
# 1     A, B, C
# 2       A

Thanks for your help!

1 answer

4


The has the function unite() that can be used for this.

df %>% 
  tidyr::unite(msg, msgA:msgC, sep = ", ")
  obs       msg
1   1 A, NA, NA
2   1 NA, B, NA
3   1 NA, NA, C
4   2 A, NA, NA

It turns out that the unite() is meant to Tidy, then the result she generates is not quite what you wanted.

To generate the expected result it is better to use the data.frame before the spread() and use group_by() combined with summarise().

library(tidyverse)
tidy <- df %>% 
  gather(tipo, msg, -obs) %>% 
  filter(!is.na(msg))

tidy %>% 
  group_by(obs) %>% 
  summarise(msg = paste(msg, collapse = ", "))
## # A tibble: 2 x 2
##     obs msg    
##   <dbl> <chr>  
## 1     1 A, B, C
## 2     2 A 

Browser other questions tagged

You are not signed in. Login or sign up in order to post.