1
I have a data.frame with columns: cnpj, name, year, rop, des, Cax, pcld.
I need to identify the cnpj that possess the value "1" in at least one of the columns rope, des, Cax and pcld for all years. After that, identify those who had discontinuity in the years.
So, as a first step, I thought I would transpose the date.frame, to look like this:
cnpj | nome | conta | 2014 | 2015 | 2016 | 2017
3243 | teste| rop | 1 | 0 | 0 | 1
3243 | teste| des | 0 | 1 | 0 | 0
3243 | teste| cax | 0 | 0 | 1 | 0
3243 | teste| pcld | 0 | 0 | 0 | 0
33333| loja | rop | 0 | 1 | NA | NA
33333| loja | des | 0 | 0 | NA | NA
33333| loja | cax | 0 | 0 | NA | NA
33333| loja | pcld | 1 | 0 | NA | NA
The cnpj "3243" would be identified as 'right'. It has at least one "1" value in every year. And cnpj "33333", which does not have data for all years following 2014, was discontinued, will have "NA" that year.
I have tried to transpose in several ways, but always gives an error. The last attempt:
library(magrittr)
dados %>% dplyr::select(CNPJ, nome, ano, rop, des, cax, pcld) %>%
tidyr::pivot_longer(
cols = -c('cnpj', 'nome'),
names_to = 'Conta'
)
I believe that when transposing the data.frame, the CNPJ that were discontinued will be with value "NA" for the year without information (discontinued), so I would make a filter.
Data example:
structure(list(
id = c(1111, 1111, 1111, 1111, 22222,22222, 22222, 22222, 33333, 33333,3243,3243,3243,3243),
name = c("empresa", "empresa", "empresa", "empresa","firma", "firma", "firma", "firma", "loja", "loja","teste","teste","teste","teste"),
year = c(2014, 2015, 2016, 2017, 2014, 2015, 2016, 2017, 2014,2015,2014, 2015, 2016, 2017),
rop = c(1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1),
des = c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0),
cax = c(1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0),
pcld = c(1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0)),
row.names = c(NA,14L), class = "data.frame")
Some better way to do it?
Unclear how it identifies a CNPJ that has been discontinued.
– Carlos Eduardo Lagosta
I believe that when transposing the data.frame, these CNPJ get "NA" for the year without information (discontinued), so I would make a filter.
– RxT
What is the criterion for determining that it was discontinued? 0 in all columns of that year? Whatever, edit the question to detail it better (rather than describe more detail in the comments). I suggest putting an example of how you want the final result, using as a base the example data you posted.
– Carlos Eduardo Lagosta
@Carloseduardolalike made
– RxT