Take previous values of a variable if the current value is 0 with a condition using dplyr in R

Asked

Viewed 35 times

1

I have a database as follows:

CNPJ    data    dataquebra  alto
2222    201603  201711      s
2222    201511  0           s
2222    201702  201711      s
2222    201704  201711      s
3333    201601  201711      s
3333    201509  0           s
3333    201512  0           s
3333    201607  201711      s
3333    201706  201711      s
1111    201701  201711      s
4444    201503  201605      s
4444    201503  201605      s
9999    201605  201712      s
9999    201511  0           s
9999    201704  201712      s
9999    201603  201712      s

I need to filter the values that the variable alto is as s. I wish that all the values of dataquebra for each CNPJ are equal. But note that some of these values are equal to zero.

Thus, replace the occurrences of dataquebra equal to 0 for a value other than 0 for that variable.

I thought I’d use the dplyr and I started the following way:

library(dplyr)
dados<-dados %>%
  group_by(CNPJ) %>%
  filter(alto=="s") %>%
  mutate(x_lag = lag(dataquebra, order_by = data))

dados<-dados %>%
  group_by(CNPJ) %>%
  filter(alto=="s") %>%
  mutate(x_lead = lead(dataquebra, order_by = data))

dados$dataquebra<-ifelse(dados$dataquebra==0 & !is.na(dados$x_lag), 
dados$x_lag, dados$dataquebra)
dados$dataquebra<-ifelse(dados$dataquebra==0 & !is.na(dados$x_lead), 
dados$x_lead, dados$dataquebra)

But for some reason it didn’t work.

1 answer

2


Your intuition was correct. Yes you can use the dplyr:

dados %>%
  filter(alto=="s") %>%
  group_by(CNPJ) %>%
  mutate(dataquebra2 = max(dataquebra))
# A tibble: 16 x 5
# Groups:   CNPJ [5]
    CNPJ   data dataquebra alto  dataquebra2
   <int>  <int>      <int> <fct>       <dbl>
 1  2222 201603     201711 s          201711
 2  2222 201511          0 s          201711
 3  2222 201702     201711 s          201711
 4  2222 201704     201711 s          201711
 5  3333 201601     201711 s          201711
 6  3333 201509          0 s          201711
 7  3333 201512          0 s          201711
 8  3333 201607     201711 s          201711
 9  3333 201706     201711 s          201711
10  1111 201701     201711 s          201711
11  4444 201503     201605 s          201605
12  4444 201503     201605 s          201605
13  9999 201605     201712 s          201712
14  9999 201511          0 s          201712
15  9999 201704     201712 s          201712
16  9999 201603     201712 s          201712

In order not to lose information and become more didactic, I created another column called dataquebra2.

See that I used the mutate like you, but the function I chose to replace the values 0 was the function max. After all, if all occurrences of dataquebra are positive (and I believe they are), whenever this value is different from 0, it is he who will be chosen as the maximum.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.