6
I have a data frame with more than 300 columns that are categorical but are encoded as numerical. Each of these columns has its own "type", that is, it has its own encoding table. My problem is to create a new data frame with the decoded variables.
I have loaded the following data frames:
- the main data frame called "data", which has 347 columns I want to decode.
- an auxiliary data frame called "vars data" with: name (variable.name) and "type" (data.type) of all variables of the main df
- an auxiliary data frame called "codes": with "type" (data.type), possible codes (value) for the respective "type" and meaning (content) of each code
I’m trying to use the LAPD to make it easier. What I’ve managed to do so far is:
# pego uma das variáveis do df principal
variavel <- "abc"
# busco no df "dados_vars" qual é o tipo desta variável
tipo.variavel <- as.character(dados_vars[dados_vars$variable.name == variavel, "data.type"])
# filtro no df "codes" os códigos específicos que esta variável pode ter
codigos <- codes %>% filter(data.type==tipo.variável) %>% select(value,content)
# crio um novo data frame com esta variável decodificada
novos.dados <- mutate(dados, var1=factor(var1,label=codigos$content,levels=codigos$value))
Now, how do I apply this procedure to all main df columns?
If it’s just to turn your code into something that works for all variables, you can put it inside a
for (i in 1:347)
and definevar1 <- paste0("var", i)
. You can change thefor
by alapply
and check if performance improves, too.– Molx
To facilitate the explanation of the problem I described the variables with the name "varX" but, in fact, each variable has a different name.
– Roger Salvini
In this case, another possibility is you exchange var for
colnames(dados)[i]
. If I understood correctly, this would solve.– Molx
Yes, this goes in the direction of the solution given by @Rcoster and which in principle solves my problem.
– Roger Salvini