0
Good morning!
Guys, I need some help.
I worked out the routine below to update a data.table (dt). The execution was very slow so I suppressed the innermost loop and associated the Resul variable directly to the date. table but it didn’t work correctly.
To associate the Resul variable directly to the date.table?
TOT_1 = "TOTAL CLASSE 1"
TOT_2 = "TOTAL CLASSE 2"
TOT_3 = "TOTAL CLASSE 3"
==== rotina muita lenta ======
setkey(dt,TIPO,VARIAVEL_1,VARIAVEL_2,VARIAVEL_3,CLASSE)
Filtro_3_A = Filtro_3 %>% filter(CONDICAO != "LIXO")
for ( i in 1 : Com_Periodos )
{
Periodo_Pesquisar = Meses_Pesquisa[i]
Ano_Pesquisar = as.integer(substr(Periodo_Pesquisar,05,08))
Mes_Pesquisar = as.integer(substr(Periodo_Pesquisar,10,11))
Coluna = (Ano_Pesquisar - 2007 ) * 12 + Mes_Pesquisar + 5
Filtro_3_B = Filtro_3_A %>% filter(!!rlang::sym(Periodo_Pesquisar) > 0 )
print(paste0(" .............................TIPO ... periodo ... ",Periodo_Pesquisar))
Resul = Filtro_3_B %>%
group_by (VAR_1 , VAR_2 , VAR_3) %>%
summarize( Distintos = n_distinct(CODIGO),
Total_Dias = sum(!!rlang::sym(Periodo_Pesquisar) ),
QUANTIDADE = n() )
Resul = as.data.frame(Resul)
Num_Linhas = nrow(Resultado)
for ( k in 1 : Num_Linhas )
{
dt[dt$TIPO == "TESTE" & dt$VARIAVEL_1== Resul[k,1] & dt$VARIAVEL_2 == Resul[k,2] & VARIAVEL_3 == Resul[k,3] & dt$CLASSE == TOT_1 , Coluna ] = Resul[k,4]
dt[dt$TIPO == "TESTE" & dt$VARIAVEL_1== Resul[k,1] & dt$VARIAVEL_2 == Resul[k,2] & VARIAVEL_3 == Resul[k,3] & dt$CLASSE == TOT_2 , Coluna ] = Resul[k,5]
dt[dt$TIPO == "TESTE" & dt$VARIAVEL_1== Resul[k,1] & dt$VARIAVEL_2 == Resul[k,2] & VARIAVEL_3 == Resul[k,3] & dt$CLASSE == TOT_3 , Coluna ] = Resul[k,6]
}
}
==== Rotina mais rápida sem o loop interno =====
TOT_1 = "TOTAL CLASSE 1"
TOT_2 = "TOTAL CLASSE 2"
TOT_3 = "TOTAL CLASSE 3"
setkey(dt,TIPO,VARIAVEL_1,VARIAVEL_2,VARIAVEL_3,CLASSE)
Filtro_3_A = Filtro_3 %>% filter(CONDICAO != "LIXO")
for ( i in 1 : Com_Periodos )
{
Periodo_Pesquisar = Meses_Pesquisa[i]
Ano_Pesquisar = as.integer(substr(Periodo_Pesquisar,05,08))
Mes_Pesquisar = as.integer(substr(Periodo_Pesquisar,10,11))
Coluna = (Ano_Pesquisar - 2007 ) * 12 + Mes_Pesquisar + 5
Filtro_3_B = Filtro_3_A %>% filter(!!rlang::sym(Periodo_Pesquisar) > 0 )
print(paste0(" .............................TIPO ... periodo ... ",Periodo_Pesquisar))
Resul = Filtro_3_B %>%
group_by (VAR_1 , VAR_2 , VAR_3) %>%
summarize( Distintos = n_distinct(CODIGO),
Total_Dias = sum(!!rlang::sym(Periodo_Pesquisar) ),
QUANTIDADE = n() )
Resul = as.data.frame(Resul)
dt[dt$TIPO == "TESTE" & dt$VARIAVEL_1== Resul[,1] & dt$VARIAVEL_2 == Resul[,2] & VARIAVEL_3 == Resul[,3] & dt$CLASSE == TOT_1 , Coluna ] = Resul[,4]
dt[dt$TIPO == "TESTE" & dt$VARIAVEL_1== Resul[,1] & dt$VARIAVEL_2 == Resul[,2] & VARIAVEL_3 == Resul[,3] & dt$CLASSE == TOT_2 , Coluna ] = Resul[,5]
dt[dt$TIPO == "TESTE" & dt$VARIAVEL_1== Resul[,1] & dt$VARIAVEL_2 == Resul[,2] & VARIAVEL_3 == Resul[,3] & dt$CLASSE == TOT_3 , Coluna ] = Resul[,6]
}
When removing the internal loop I cannot correctly associate the variables of the data.frame with those of the data.table.
Welcome to Stackoverflow! Unfortunately, this question cannot be reproduced by anyone trying to answer it. Please, take a look at this link and see how to ask a reproducible question in R. So, people who wish to help you will be able to do this in the best possible way.
– Marcus Nunes
You can’t check exactly how to improve your routine without a playable example, but here are two suggestions: 1) Write everything using data.tables resources and syntax. dplyr is fast, but there’s no point in using it together with DT; things like group_by() can be done more efficiently using the syntax of data.table. The same goes for things like
dt[dt$TIPO...
. 2) Useset
for loops, it’s the fastest way. Check out the documentation.– Carlos Eduardo Lagosta