Identify the ID that has value at least one column for all rows

Asked

Viewed 53 times

1

I have a data.frame with columns: cnpj, name, year, rop, des, Cax, pcld.

I need to identify the cnpj that possess the value "1" in at least one of the columns rope, des, Cax and pcld for all years. After that, identify those who had discontinuity in the years.

So, as a first step, I thought I would transpose the date.frame, to look like this:

 cnpj | nome | conta | 2014 | 2015 | 2016 | 2017
 
 3243 | teste| rop   |   1  |   0  |  0    | 1
 3243 | teste| des   |   0  |   1  |  0    | 0
 3243 | teste| cax   |   0  |   0  |  1    | 0
 3243 | teste| pcld  |   0  |   0  |  0    | 0
 33333| loja | rop   |   0  |   1  |  NA   | NA 
 33333| loja | des   |   0  |   0  |  NA   | NA 
 33333| loja | cax   |   0  |   0  |  NA   | NA 
 33333| loja | pcld  |   1  |   0  |  NA   | NA 

The cnpj "3243" would be identified as 'right'. It has at least one "1" value in every year. And cnpj "33333", which does not have data for all years following 2014, was discontinued, will have "NA" that year.

I have tried to transpose in several ways, but always gives an error. The last attempt:

library(magrittr)

dados   %>% dplyr::select(CNPJ, nome, ano, rop, des, cax, pcld) %>% 
            tidyr::pivot_longer(
                                 cols = -c('cnpj', 'nome'),
                                 names_to = 'Conta'

                                ) 
 

I believe that when transposing the data.frame, the CNPJ that were discontinued will be with value "NA" for the year without information (discontinued), so I would make a filter.

Data example:

structure(list(
    id = c(1111, 1111, 1111, 1111, 22222,22222, 22222, 22222, 33333, 33333,3243,3243,3243,3243), 
    name = c("empresa", "empresa", "empresa", "empresa","firma", "firma", "firma", "firma", "loja", "loja","teste","teste","teste","teste"), 
    year = c(2014, 2015, 2016, 2017, 2014, 2015, 2016, 2017, 2014,2015,2014, 2015, 2016, 2017),
    rop = c(1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1), 
    des = c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0), 
    cax = c(1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0), 
    pcld = c(1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0)), 
    row.names = c(NA,14L), class = "data.frame")

Some better way to do it?

  • Unclear how it identifies a CNPJ that has been discontinued.

  • I believe that when transposing the data.frame, these CNPJ get "NA" for the year without information (discontinued), so I would make a filter.

  • What is the criterion for determining that it was discontinued? 0 in all columns of that year? Whatever, edit the question to detail it better (rather than describe more detail in the comments). I suggest putting an example of how you want the final result, using as a base the example data you posted.

  • @Carloseduardolalike made

1 answer

1


It has to pivot twice. The pivot_longer for the variables of the accounts after the pivot_wider to distribute the years.

library(tidyverse)

df <- structure(list(
  id = c(1111, 1111, 1111, 1111, 22222,22222, 22222, 22222, 33333, 33333,3243,3243,3243,3243), 
  name = c("empresa", "empresa", "empresa", "empresa","firma", "firma", "firma", "firma", "loja", "loja","teste","teste","teste","teste"), 
  year = c(2014, 2015, 2016, 2017, 2014, 2015, 2016, 2017, 2014,2015,2014, 2015, 2016, 2017),
  rop = c(1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1), 
  des = c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0), 
  cax = c(1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0), 
  pcld = c(1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0)), 
  row.names = c(NA,14L), class = "data.frame")

df   %>% 
  tidyr::pivot_longer(
    cols = -c(id, name, year),
    names_to = 'Conta'
  ) %>% 
  tidyr::pivot_wider(
    id_cols = c(id, name, Conta),
    names_from = year,
    values_fro = value
  )
#> # A tibble: 16 x 7
#>       id name    Conta `2014` `2015` `2016` `2017`
#>    <dbl> <chr>   <chr>  <dbl>  <dbl>  <dbl>  <dbl>
#>  1  1111 empresa rop        1      0      0      0
#>  2  1111 empresa des        1      0      0      0
#>  3  1111 empresa cax        1      1      0      0
#>  4  1111 empresa pcld       1      0      1      0
#>  5 22222 firma   rop        1      0      1      0
#>  6 22222 firma   des        0      0      0      0
#>  7 22222 firma   cax        0      0      0      1
#>  8 22222 firma   pcld       1      0      0      0
#>  9 33333 loja    rop        0      1     NA     NA
#> 10 33333 loja    des        0      0     NA     NA
#> 11 33333 loja    cax        0      0     NA     NA
#> 12 33333 loja    pcld       1      0     NA     NA
#> 13  3243 teste   rop        1      0      0      1
#> 14  3243 teste   des        0      1      0      0
#> 15  3243 teste   cax        0      0      1      0
#> 16  3243 teste   pcld       0      0      0      0

Created on 2021-03-12 by the reprex package (v1.0.0)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.