How to create a sequence of dummy variables with loop in r

Asked

Viewed 509 times

1

I would like to create Dummies to identify the company in the database. For example, a new variable called "GLO" would be 1 if the enterprise variable assumed the value GLO and 0 c.c.

The data structure is like this:

head(tarifas)


    ano mes empresa origem destino tarifa assentos
1 2002   1     GLO   SBPA    SBBR 397,00       51
2 2002   1     GLO   SBSV    SBRF 272,00        5
3 2002   1     GLO   SBFL    SBGL 223,00      196
4 2002   1     GLO   SBGL    SBSP  96,00      615
5 2002   1     GLO   SBGL    SBRF 340,00      297
6 2002   1     GLO   SBSP    SBFL 145,00      189

What I tried to do was use the dplyr package along with for loop, but something is wrong. For example, to create an identifier for the company GLO and AZU, I used the following code:

for (k in c("GLO", "AZU")) {
 tarifas2<- tarifas %>%
  mutate(paste0(k) = 0) %>%
  mutate(replace(paste0(k), empresa == paste0(",k,"),1))
}
  • Try model.matrix(~ 0 + empresa, df1). But note that it is almost certain not to need to create Dummies explicitly, the R modeling functions do this automatically.

2 answers

1

The package onehot does this automatically:

library(onehot)
empresas <- data.frame(
  empresas = sample(c("GLO", "AZU"), 10, replace = TRUE)
  )

empresas
##    empresas
## 1       AZU
## 2       AZU
## 3       GLO
## 4       GLO
## 5       AZU
## 6       AZU
## 7       GLO
## 8       AZU
## 9       AZU
## 10      AZU

dummy <- predict(onehot(empresas), empresas)
dummy
##       empresas=AZU empresas=GLO
##  [1,]            1            0
##  [2,]            1            0
##  [3,]            0            1
##  [4,]            0            1
##  [5,]            1            0
##  [6,]            1            0
##  [7,]            0            1
##  [8,]            1            0
##  [9,]            1            0
## [10,]            1            0

If it’s not interesting that the columns get names like empresas=XXX, just use the function str_replace package stringr to replace the string empresas= for nothing in column names:

library(stringr)
colnames(dummy) <- str_replace(colnames(dummy),
                               "empresas=",
                               "")
dummy
##       AZU GLO
##  [1,]   1   0
##  [2,]   1   0
##  [3,]   0   1
##  [4,]   0   1
##  [5,]   1   0
##  [6,]   1   0
##  [7,]   0   1
##  [8,]   1   0
##  [9,]   1   0
## [10,]   1   0

0


A base R solution, using the example of the Marcus Nunes, but with set.seed and with the dataframe name changed.

set.seed(1234)
df1 <- data.frame(
  empresas = sample(c("GLO", "AZU"), 10, replace = TRUE)
)


model.matrix(~ 0 + empresas, df1)
#  empresasAZU empresasGLO
#1           1           0
#2           1           0
#3           1           0
#4           1           0
#5           0           1
#6           1           0
#7           0           1
#8           0           1
#9           0           1
#10          1           0
#attr(,"assign")
#[1] 1 1
#attr(,"contrasts")
#attr(,"contrasts")$empresas
#[1] "contr.treatment"

Or, with this result in the original df,

cbind(df1, model.matrix(~ 0 + empresas, df1))
#  empresas empresasAZU empresasGLO
#1      AZU           1           0
#2      AZU           1           0
#3      AZU           1           0
#4      AZU           1           0
#5      GLO           0           1
#6      AZU           1           0
#7      GLO           0           1
#8      GLO           0           1
#9      GLO           0           1
#10     AZU           1           0

Browser other questions tagged

You are not signed in. Login or sign up in order to post.