Combination of different values

Asked

Viewed 80 times

-1

Considering a dataframe with 2 columns is intended, for each distinct value of the source column:

  1. select the corresponding values present in the destination column
  2. form combinations 2 to 2 of separate elements based on the values in point 1
  3. record result in a new dataframe.

Application example:

df <- data.frame(
    origem = c("A","A", "A", "E", "E", "D", "D", "D", "D"), 
    destino=c("B","A","C","C", "B", "A","A", "A","B"))

For A (origin column) we have associated the values: A, B and C. Forming combinations 2 to 2 of the elements calculated would have A-B, A-C, B-A, B-C, C-A, C-B. The objective would be to obtain the final table:

   origem x y
1       A B A
2       A B C
3       A A B
4       A A C
5       A C B
6       A C A
7       E C B
8       E B C
9       D A B
10      D B A

In this example I have 3 unique values in "origin": A, E and D. Each of these values has at least one corresponding value in "destination" (unique values originating in A: A, B, C; unique values originating in E: C and B; unique values originating in D: A and B).

I can obtain the desired combinations when filter by a value in the "source" however, I needed to generalize to all the values present in that column. Someone can help?

  • Why is it that for origin D and E there are only two lines in the final result? With origin AND we also have 3 letters and 6 pairs without repetitions. The same for D.

  • Obg by the question. I edited the question by putting detail. The value E in "origin" has only 2 remarks: (E, C) and (AND, B), what I want is to combine the values "fate" 2 to 2 (C, B) i.é: C-B and B-C ( the E value serves to split the df and to calculate the destinations to be combined )

1 answer

1


A solution, using data table.:

library(data.table)

setDT(df)[, unique(expand.grid(destino, destino)), by = origem][Var1 != Var2]
#>     origem Var1 Var2
#>  1:      A    A    B
#>  2:      A    C    B
#>  3:      A    B    A
#>  4:      A    C    A
#>  5:      A    B    C
#>  6:      A    A    C
#>  7:      E    B    C
#>  8:      E    C    B
#>  9:      D    B    A
#> 10:      D    A    B

expand.grid generates all combinations of destination points; data.table facilitates group operation (point of origin, in case). unique removes duplicate combinations, generated because for the same origin there is repetition of destinations. Since you do not want the combinations in which the letters are the same, this condition is filtered from the final result.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.