Take the previous value with a condition on R

Asked

Viewed 125 times

5

I have a single key base made of two other repeating variables, CNPJ and date. I need to do some sum and division accounts with the previous date values for each CNPJ. If there is no previous date value for each CNPJ must return empty or 0. The database looks like this:

CNPJ    data    chave           X10000007   X11000006   X11100009
77777   201602  77777 201602    9900189     5246,6      799,25
77777   201603  77777 201603    9334241     2016,82     926,92
77777   201604  77777 201604    9178311     1101,31     91,01
77777   201605  77777 201605    8701743     1539,79     609,09
77777   201606  77777 201606    7954683     892,69      236,8
77777   201607  77777 201607    7412048     834,67      167,13
77777   201608  77777 201608    6811430     615,38      44,04
77777   201609  77777 201609    6602538     894,48      362,94
77777   201610  77777 201610    8216856     747,54      270,3
77777   201611  77777 201611    8495641     1031,63     571,09
77777   199512  77777 199512    1000        0           0
3333    199601  3333 199601     1000        0           0
3333    199602  3333 199602     105370,5    7534,5      2617,45
3333    199603  3333 199603     365025,6    138229,5    6312,22
3333    199604  3333 199604     361016,5    71882,47    8471,27
3333    199605  3333 199605     451561,7    80290,95    8607,74
3333    199606  3333 199606     495800,3    91955,03    7413,48

I believe I need something like an index to assist, which restarts upon encountering a new CNPJ. How to do this with R?

1 answer

5


It is possible to do using dplyr with group_by and lag. For example:

> base
# A tibble: 17 × 4
    CNPJ   data chave      x
   <dbl>  <dbl> <dbl>  <dbl>
1  77777 201602 77777 201602
2  77777 201603 77777 201603
3  77777 201604 77777 201604
4  77777 201605 77777 201605
5  77777 201606 77777 201606
6  77777 201607 77777 201607
7  77777 201608 77777 201608
8  77777 201609 77777 201609
9  77777 201610 77777 201610
10 77777 201611 77777 201611
11 77777 199512 77777 199512
12  3333 199601  3333 199601
13  3333 199602  3333 199602
14  3333 199603  3333 199603
15  3333 199604  3333 199604
16  3333 199605  3333 199605
17  3333 199606  3333 199606

Suppose the base above and use:

library(dplyr)
base %>%
  group_by(CNPJ) %>%
  mutate(x_lag = lag(x, order_by = data))

Source: local data frame [17 x 5]
Groups: CNPJ [2]

    CNPJ   data chave      x  x_lag
   <dbl>  <dbl> <dbl>  <dbl>  <dbl>
1  77777 201602 77777 201602 199512
2  77777 201603 77777 201603 201602
3  77777 201604 77777 201604 201603
4  77777 201605 77777 201605 201604
5  77777 201606 77777 201606 201605
6  77777 201607 77777 201607 201606
7  77777 201608 77777 201608 201607
8  77777 201609 77777 201609 201608
9  77777 201610 77777 201610 201609
10 77777 201611 77777 201611 201610
11 77777 199512 77777 199512     NA
12  3333 199601  3333 199601     NA
13  3333 199602  3333 199602 199601
14  3333 199603  3333 199603 199602
15  3333 199604  3333 199604 199603
16  3333 199605  3333 199605 199604
17  3333 199606  3333 199606 199605
  • 1

    Daniel, great answer as always. What I can’t do now is to call the created x_lag variable (in this example). She turned neither a vector nor went to the base.

  • @T.Veiga You need to save the result of this: for example: base <- base %>% mutate... Here you have the column in the base object.

  • Of course, Daniel. I’ve never used any of those functions %>%.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.