Code improvement

Question

Code improvement

Asked 8 years ago

Viewed 85 times

5

Good afternoon. I have the following data structure:

structure(list(CIDADE = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L), .Label = c("A", "B"), class = "factor"), MES = c(1L, 2L, 
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 
6L, 7L, 8L, 9L, 10L, 11L, 12L), CLIENTES = c(29L, 26L, 25L, 29L, 
30L, 27L, 25L, 25L, 25L, 25L, 27L, 27L, 27L, 26L, 27L, 28L, 28L, 
25L, 26L, 26L, 25L, 30L, 30L, 29L), REMANEJADOS = c(3L, 2L, 1L, 
1L, 4L, 4L, 2L, 3L, 1L, 1L, 3L, 4L, 5L, 3L, 4L, 2L, 5L, 5L, 5L, 
4L, 5L, 2L, 4L, 3L), REMANEJAMENTOS = c(4L, 3L, 4L, 3L, 5L, 1L, 
4L, 4L, 2L, 5L, 5L, 1L, 5L, 4L, 2L, 5L, 5L, 5L, 2L, 1L, 5L, 2L, 
3L, 3L), TRANSFERENCIAS = c(2L, 4L, 2L, 1L, 3L, 3L, 5L, 4L, 2L, 
3L, 4L, 5L, 1L, 4L, 1L, 4L, 1L, 4L, 4L, 1L, 1L, 1L, 2L, 4L), 
EVASOES = c(1L, 2L, 2L, 5L, 1L, 2L, 1L, 1L, 2L, 2L, 5L, 2L, 
5L, 4L, 3L, 2L, 5L, 2L, 4L, 1L, 2L, 4L, 3L, 5L), HORAS = c(80, 
80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 
80, 80, 80, 80, 80, 80, 80, 80), TURMAS = c(2L, 1L, 1L, 1L, 
2L, 2L, 2L, 3L, 3L, 1L, 2L, 1L, 2L, 3L, 1L, 3L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 3L)), .Names = c("CIDADE", "MES", "CLIENTES", 
"REMANEJADOS", "REMANEJAMENTOS", "TRANSFERENCIAS", "EVASOES", 
"HORAS", "TURMAS"), row.names = c(NA, -24L), class = "data.frame")

I calculated the accumulation of some variables that are necessary

dados <- dados %>% 
  group_by(CIDADE) %>% 
  mutate(CliAcu = cumsum(CLIENTES),
         RmdAcu = cumsum(REMANEJADOS),
         RmtAcu = cumsum(REMANEJAMENTOS),
         TrsAcu = cumsum(TRANSFERENCIAS),
         EvsAcu = cumsum(EVASOES),
         TurAcu = cumsum(TURMAS))

Now I need to calculate a formula that takes into account these accumulated variables (CliAcu, RmdAcu, RmtAcu, TrsAcu, EvsAcu, TurAcu) and the variable HORAS according to the following relationship:

Formula = (CliAcu + RmdAcu - RmtAcu[mês_anterior] - TrsAcu[mês_anterior] - EvsAcu[mês_anterior])*HORAS/TurAcu

I have the following code that does what I need:

if(dim(dados)[1] != 0){
  dados$Valor[1] <- (dados$CliAcu[1]+dados$RmdAcu[1]-0-0-0)*
                         dados$HORAS[1]/dados$TurAcu[1]
  if(nrow(dados) != 1){
    for(i in 2:nrow(dados)){
      if(dados$MES[i] == 1){
        dados$Valor[i] <- (dados$CliAcu[i]+dados$RmdAcu[i]-0-0-0)*
                               dados$HORAS[i]/dados$TurAcu[i]
      } else{
        if(dados$MES[i] <= dados$MES[i-1]){
          dados$Valor[i] <- (dados$CliAcu[i]+dados$RmdAcu[i]-0-0-0)*
                                 dados$HORAS[i]/dados$TurAcu[i]
        } else{
          dados$Valor[i] <- (dados$CliAcu[i]+dados$RmdAcu[i]-
                               dados$EvsAcu[i-1]-dados$RmtAcu[i-1]-
                               dados$TrsAcu[i-1])*
                                 dados$HORAS[i]/dados$TurAcu[i]
        }
      }
    }
  }
}

As an observation, I am decreasing 0 three times in 3 accounts because in the 1st month of the observations the subtractions do not interest me (this 1st month can be any one of the months of the year).

Does anyone have any suggestions for improving this last code, using the package dplyr for example.

You need to calculate the formula value by city?

– Guilherme Marthe

2017/07/24 at 17:45

1 answer

Browser other questions tagged r dplyr

You are not signed in. Login or sign up in order to post.

by Guilherme Marthe • **550** points · Answer 1 · 2017-07-24T17:54:54+00:00

To deal with "forward" or "back" values of an observation, given the ordering of another variable, the dplyr has the functions arrange to order, and lag and lead to access the previous or later value of a line in the data.frame/tibble. There is the option to find the value x previous, but for this check the documentation here. There is also a Vignette just on the subject of pretty cool window functions.

In your case, I think it possible that your problem is solved with the following code:

dados %>%
    group_by(CIDADE) %>% 
    arrange(MES) %>% 
    transmute(
        MES,
        calculo = (CliAcu + RmdAcu - lag(RmtAcu, default = 0) - lag(TrsAcu, default = 0) - lag(EvsAcu, default = 0))*HORAS/TurAcu
        )

Assuming you want the value per city. The result of the previous computation is:

# A tibble: 24 x 3
# Groups:   CIDADE [2]
    CIDADE   MES   calculo
    <fctr> <int>     <dbl>
1      A     1        NA
2      B     1        NA
3      A     2  760.0000
4      B     2  976.0000
5      A     3  810.0000
6      B     3 1146.6667
7      A     4  968.8889
8      B     4  952.0000
9      A     5  966.6667
10     B     5 1054.5455
# ... with 14 more rows