A solution can be found with the package igraph
, for graph problems. It seems natural to treat the question problem as a problem of finding the connected components of a graph.
1. Load the package igraph
and create a graph from the base.
library(igraph)
g <- graph_from_data_frame(input)
plot(g, vertex.size = 30, vertex.color = 'lightgrey', edge.arrow.width = 0.5)
2. Now determine the components. The function all_simple_paths
is an option. But I will use the function subcomponent
passing each of the vertices of the first column of the base input
.
v_num <- unique(match(input[[1]], names(V(g))))
path_list <- mapply(subcomponent, list(g), v_num)
names(path_list) <- v_num
3. Now, knowing which ones are unique, the above code gives a component to each vector element input$id_a
, possibly with repetitions.
Note: here is called the function str_sort
of one more external package, stringr
. This serves to sort the names of vertices, which are of classes "character"
, in numerical order but not absolutely necessary for the final result. The order given by the base function sort
also serves.
path_list2 <- lapply(path_list, function(p){
p <- unlist(p, recursive = FALSE)
stringr::str_sort(unique(names(p)), numeric = TRUE)
})
path_list2
#$`1`
#[1] "12" "67764"
#
#$`2`
#[1] "15" "68663" "68667"
#
#$`3`
#[1] "15" "68663" "68667"
#
#$`4`
#[1] "14" "19" "34" "1001" "2112"
#
#$`5`
#[1] "20"
#
#$`6`
#[1] "14" "19" "34" "1001" "2112"
4. These vectors are the vectors of the graph above but there are in fact repetitions, one must stay with each of these vectors.
final <- lapply(seq_along(path_list2), function(i){
keep <- sapply(seq_along(path_list2)[-seq_len(i)], function(j){
length(intersect(path_list2[[i]], path_list2[[j]])) == 0
})
if(all(keep)) path_list2[[i]] else NULL
})
final <- final[lengths(final) > 0]
final
#[[1]]
#[1] "12" "67764"
#
#[[2]]
#[1] "15" "68663" "68667"
#
#[[3]]
#[1] "20"
#
#[[4]]
#[1] "14" "19" "34" "1001" "2112"
Now we only have one of each, which corresponds to the desired result.
5. With the format of the question.
sapply(final, paste, collapse = "//")
#[1] "12//67764" "15//68663//68667" "20"
#[4] "14//19//34//1001//2112"