How to create dummy variables

Asked

Viewed 523 times

3

I’m trying to turn every variable in my database into dummy variables:

>dados
  X1 X2 X3
1  1  3  1
2  3  2  1
3  3  2  1
4  2  3  2
5  2  3  3

I’m trying to create binary vectors for this. But, I can’t do it the right way. Since I have 3 categories per variable, the number of dummy variables is: k-1 dummy variables. This would result in 2 artificial variables per variable.

What I tried was this:

library(mlr)
createDummyFeatures(dados,cols=NULL)

   1 2 3
1  1 0 0
2  0 0 1
3  0 0 1
4  0 1 0
5  0 1 0
6  0 0 1
7  0 1 0
8  0 1 0
9  0 0 1
10 0 0 1
11 1 0 0
12 1 0 0
13 1 0 0
14 0 1 0
15 0 0 1

Because it returns me 3 variables per variable (because k-1 dummy variables, should be two). Besides, they are in the same column! How do I solve these problems? They should stay that way:

   a b    c d    e f 
1  1 0    0 0    1 0
2  0 0    0 1    1 0
3  0 0    0 1    1 0
4  0 1    0 0    0 1
5  0 1    0 0    0 0
  • 1

    But the function rep recognizes the digit 6 and returns a vector of 1. It is not clear to me what you want to do differently.

  • You can create a function that only depends on the digit you want to turn into multiple numbers 1, using the function rep within its function.

  • I edited the question to better explain my goal. I’m trying to create dummy variables. I looked for something on the site, but found nothing on this subject.

1 answer

4


The closest I got to the result you expect was using the function dummyVars package caret. The result was not equal because the example you gave does not have the number 1 in the column X2, so it is omitted from the final result.

First we need to construct the variables as a factor:

dados <- data.frame(X1 = as.factor(c(1,3,3,2,2)), X2 = as.factor(c(3,2,2,3,3)), X3 = as.factor(c(1,1,1,2,3)))

Then I modified the reference of the variables to arrive at what you expect:

dados$X1 <- relevel(dados$X1, ref = 3)
dados$X2 <- relevel(dados$X2, ref = 3)
dados$X3 <- relevel(dados$X3, ref = 3)

Finally, I created the variables dummy with the package caret:

library(caret)
dummy <- dummyVars(~ ., data = dados, fullRank = T)

The result is:

predict(dummy, dados)

  X1.1 X1.2 X2.3 X3.1 X3.2
1    1    0    1    1    0
2    0    0    0    1    0
3    0    0    0    1    0
4    0    1    1    0    1
5    0    1    1    0    0
  • jewel! It worked.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.