Automatically creating new variables through interaction between two pre-existing variables

Asked

Viewed 45 times

3

Suppose I own the following set of dados:

dados
#>    letras numeros  cores valor
#> 1       a       1 branco     2
#> 2       a       1  preto     1
#> 3       a       2 branco     9
#> 4       a       2  preto     4
#> 5       a       3 branco     8
#> 6       a       3  preto     4
#> 7       a       4 branco     3
#> 8       a       4  preto     6
#> 9       b       1 branco     3
#> 10      b       1  preto     1
#> 11      b       2 branco    10
#> 12      b       2  preto     5
#> 13      b       3 branco     7
#> 14      b       3  preto    10
#> 15      b       4 branco    10
#> 16      b       4  preto    10
#> 17      c       1 branco    10
#> 18      c       1  preto     2
#> 19      c       2 branco     8
#> 20      c       2  preto     7
#> 21      c       3 branco     5
#> 22      c       3  preto     5
#> 23      c       4 branco     5
#> 24      c       4  preto     3


dados <- 
structure(list(letras = c("a", "a", "a", "a", "a", "a", "a", 
"a", "b", "b", "b", "b", "b", "b", "b", "b", "c", "c", "c", "c", 
"c", "c", "c", "c"), numeros = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 
4L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 
4L), cores = c("branco", "preto", "branco", "preto", "branco", 
"preto", "branco", "preto", "branco", "preto", "branco", "preto", 
"branco", "preto", "branco", "preto", "branco", "preto", "branco", 
"preto", "branco", "preto", "branco", "preto"), valor = c(2L, 
1L, 9L, 4L, 8L, 4L, 3L, 6L, 3L, 1L, 10L, 5L, 7L, 10L, 10L, 10L, 
10L, 2L, 8L, 7L, 5L, 5L, 5L, 3L)), class = "data.frame", row.names = c(NA, 
-24L))

I would like to get new columns for it, with the combination between the pre-existing columns. For example, if I want to join the columns letras and numeros in a column, just use the function unite:

library(tidyverse)

dados %>% 
  unite(letras_numeros, c("letras", "numeros"), remove = FALSE)
#>    letras_numeros letras numeros  cores valor
#> 1             a_1      a       1 branco     2
#> 2             a_1      a       1  preto     1
#> 3             a_2      a       2 branco     9
#> 4             a_2      a       2  preto     4
#> 5             a_3      a       3 branco     8
#> 6             a_3      a       3  preto     4
#> 7             a_4      a       4 branco     3
#> 8             a_4      a       4  preto     6
#> 9             b_1      b       1 branco     3
#> 10            b_1      b       1  preto     1
#> 11            b_2      b       2 branco    10
#> 12            b_2      b       2  preto     5
#> 13            b_3      b       3 branco     7
#> 14            b_3      b       3  preto    10
#> 15            b_4      b       4 branco    10
#> 16            b_4      b       4  preto    10
#> 17            c_1      c       1 branco    10
#> 18            c_1      c       1  preto     2
#> 19            c_2      c       2 branco     8
#> 20            c_2      c       2  preto     7
#> 21            c_3      c       3 branco     5
#> 22            c_3      c       3  preto     5
#> 23            c_4      c       4 branco     5
#> 24            c_4      c       4  preto     3

How could I automate this process so that all combinations between two and three columns could be created? That is, how to get at the end a data frame that has columns letras_numeros, letras_cores, numeros_cores, letras_numeros_cores, letras, numeros, cores and valor?

  • Save @Marcus ! I think of a solution as a function, it serves? :)

  • It’ll do, of course. I’m finishing my solution as a function, without using Tidy ideas, but maybe yours is simpler and more generalizable than mine.

1 answer

3


This base R solution is not a function, but gives an idea of how to get the desired result. It uses combn to apply interaction combinations 2 and 2 of the columns "letras", "numeros" and "cores". After that apply the same function to the three columns. Then it is only cbind the results.

fun <- function(x, envir = as.environment(dados)){
  X <- mget(x, envir = envir)
  as.character(interaction(X, sep = "_"))
}

variaveis <- names(dados)[-4]
fac1 <- combn(variaveis, 2, fun, simplify = FALSE)
names(fac1) <- sapply(combn(variaveis, 2, simplify = FALSE), paste, collapse = "_")
fac2 <- fun(variaveis)

But now the column corresponding to fac2 has the wrong name:

names(res)
#[1] "letras_numeros" "letras_cores"   "numeros_cores"  "fac2"           "letras"         "numeros"       
#[7] "cores"          "valor"

You fix that and that’s it.

names(res)[names(res) == "fac2"] <- paste(variaveis, collapse = "_")
head(res)
#  letras_numeros letras_cores numeros_cores letras_numeros_cores letras numeros  cores valor
#1            a_1     a_branco      1_branco           a_1_branco      a       1 branco     2
#2            a_1      a_preto       1_preto            a_1_preto      a       1  preto     1
#3            a_2     a_branco      2_branco           a_2_branco      a       2 branco     9
#4            a_2      a_preto       2_preto            a_2_preto      a       2  preto     4
#5            a_3     a_branco      3_branco           a_3_branco      a       3 branco     8
#6            a_3      a_preto       3_preto            a_3_preto      a       3  preto     4

This function is easily generalizable. And automatically solve the column problem with the wrong name, the right name is also assigned within the cycle lapply.

funCombinacoes <- function(data, vars, n){
  fun <- function(x, envir = as.environment(data)){
    X <- mget(x, envir = envir)
    as.character(interaction(X, sep = "_"))
  }
  res <- lapply(n, function(m){
    fac1 <- combn(vars, m, fun, simplify = FALSE)
    names(fac1) <- sapply(combn(vars, m, simplify = FALSE), paste, collapse = "_")
    fac1
  })
  res <- do.call(cbind, unlist(res, recursive = FALSE))
  cbind.data.frame(res, data)
}

res2 <- funCombinacoes(dados, variaveis, n = 2:3)
identical(res, res2)
#[1] TRUE
  • It got really good! I didn’t know that combn function, I had developed a function to solve that for this answer.

  • I had solved the problem last night, also using combn, but this solution became more elegant than mine.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.