Doubt in the use of data.table

Asked

Viewed 54 times

0

Good morning!

Guys, I need some help.

I worked out the routine below to update a data.table (dt). The execution was very slow so I suppressed the innermost loop and associated the Resul variable directly to the date. table but it didn’t work correctly.

To associate the Resul variable directly to the date.table?

TOT_1 = "TOTAL CLASSE 1"
TOT_2 = "TOTAL CLASSE 2"
TOT_3 = "TOTAL CLASSE 3"

==== rotina muita lenta ======

setkey(dt,TIPO,VARIAVEL_1,VARIAVEL_2,VARIAVEL_3,CLASSE)

Filtro_3_A = Filtro_3 %>% filter(CONDICAO != "LIXO")

for ( i in 1 : Com_Periodos )
{

Periodo_Pesquisar = Meses_Pesquisa[i]
Ano_Pesquisar = as.integer(substr(Periodo_Pesquisar,05,08))
Mes_Pesquisar = as.integer(substr(Periodo_Pesquisar,10,11))

Coluna = (Ano_Pesquisar - 2007 ) * 12 + Mes_Pesquisar + 5

Filtro_3_B = Filtro_3_A %>% filter(!!rlang::sym(Periodo_Pesquisar) > 0 )

print(paste0(" .............................TIPO ... periodo ... ",Periodo_Pesquisar))

Resul = Filtro_3_B %>%
group_by (VAR_1 , VAR_2 , VAR_3) %>%
summarize( Distintos = n_distinct(CODIGO),
Total_Dias = sum(!!rlang::sym(Periodo_Pesquisar) ),
QUANTIDADE = n() )

Resul = as.data.frame(Resul)

Num_Linhas = nrow(Resultado)

for ( k in 1 : Num_Linhas )
{

dt[dt$TIPO == "TESTE" & dt$VARIAVEL_1== Resul[k,1] & dt$VARIAVEL_2 == Resul[k,2] & VARIAVEL_3 == Resul[k,3] & dt$CLASSE == TOT_1 , Coluna ] = Resul[k,4]
dt[dt$TIPO == "TESTE" & dt$VARIAVEL_1== Resul[k,1] & dt$VARIAVEL_2 == Resul[k,2] & VARIAVEL_3 == Resul[k,3] & dt$CLASSE == TOT_2 , Coluna ] = Resul[k,5]
dt[dt$TIPO == "TESTE" & dt$VARIAVEL_1== Resul[k,1] & dt$VARIAVEL_2 == Resul[k,2] & VARIAVEL_3 == Resul[k,3] & dt$CLASSE == TOT_3 , Coluna ] = Resul[k,6]

}
}

==== Rotina mais rápida sem o loop interno =====

TOT_1 = "TOTAL CLASSE 1"
TOT_2 = "TOTAL CLASSE 2"
TOT_3 = "TOTAL CLASSE 3"

setkey(dt,TIPO,VARIAVEL_1,VARIAVEL_2,VARIAVEL_3,CLASSE)

Filtro_3_A = Filtro_3 %>% filter(CONDICAO != "LIXO")

for ( i in 1 : Com_Periodos )
{

Periodo_Pesquisar = Meses_Pesquisa[i]
Ano_Pesquisar = as.integer(substr(Periodo_Pesquisar,05,08))
Mes_Pesquisar = as.integer(substr(Periodo_Pesquisar,10,11))

Coluna = (Ano_Pesquisar - 2007 ) * 12 + Mes_Pesquisar + 5

Filtro_3_B = Filtro_3_A %>% filter(!!rlang::sym(Periodo_Pesquisar) > 0 )

print(paste0(" .............................TIPO ... periodo ... ",Periodo_Pesquisar))

Resul = Filtro_3_B %>%
group_by (VAR_1 , VAR_2 , VAR_3) %>%
summarize( Distintos = n_distinct(CODIGO),
Total_Dias = sum(!!rlang::sym(Periodo_Pesquisar) ),
QUANTIDADE = n() )

Resul = as.data.frame(Resul)

dt[dt$TIPO == "TESTE" & dt$VARIAVEL_1== Resul[,1] & dt$VARIAVEL_2 == Resul[,2] & VARIAVEL_3 == Resul[,3] & dt$CLASSE == TOT_1 , Coluna ] = Resul[,4]
dt[dt$TIPO == "TESTE" & dt$VARIAVEL_1== Resul[,1] & dt$VARIAVEL_2 == Resul[,2] & VARIAVEL_3 == Resul[,3] & dt$CLASSE == TOT_2 , Coluna ] = Resul[,5]
dt[dt$TIPO == "TESTE" & dt$VARIAVEL_1== Resul[,1] & dt$VARIAVEL_2 == Resul[,2] & VARIAVEL_3 == Resul[,3] & dt$CLASSE == TOT_3 , Coluna ] = Resul[,6]

}

When removing the internal loop I cannot correctly associate the variables of the data.frame with those of the data.table.

  • Welcome to Stackoverflow! Unfortunately, this question cannot be reproduced by anyone trying to answer it. Please, take a look at this link and see how to ask a reproducible question in R. So, people who wish to help you will be able to do this in the best possible way.

  • You can’t check exactly how to improve your routine without a playable example, but here are two suggestions: 1) Write everything using data.tables resources and syntax. dplyr is fast, but there’s no point in using it together with DT; things like group_by() can be done more efficiently using the syntax of data.table. The same goes for things like dt[dt$TIPO.... 2) Use set for loops, it’s the fastest way. Check out the documentation.

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.