0
I have the following database: cnpj_root, affiliate (1 = matrix, 2 = affiliate), Uf, active_em_2017, active_em_2018, active_em_2019, active_em_2020 (NA= inactive, 1 = active).
What I need: identify, by Uf and year, the quantity in companies whose headquarters is in another state.
Former simple:
cnpj | Matriz_filial | Uf | 2017 | 2018 | 2019 | 2020
123 1 SP 1 1 1 1
123 2 BA 0 0 1 1
123 2 BA 0 0 1 1
123 2 SP 1 1 0 0
456 1 SP 0 1 1 1
456 2 BA 0 0 1 0
BA = 2 in 2019 and 1 in 2020 (the matrix is in SP) SP = 0 (head office and subsidiary in the same state)
Expected result according to the sample basis:
BA = 1 in 2017, 1 in 2018, 1 in 2019, 0 in 2020 (has only subsidiaries in this state, the matrix is in another)
SP = 1 in all years (has only subsidiaries in that state, the head office is in another)
GO = 1 in all years (has only subsidiaries in that state, the parent company is in another)
All other = 0 in all years (headquarters and subsidiaries in the same state)
Then I need to plot a graph: y = Quantity, x = Year, group = Uf.
I tried to create a flag column to identify, but it’s not working.
empresa %>%
tidyr::pivot_longer(
cols = -c(cnpj_raiz, identificador_matriz_filial, uf),
names_to = 'Ano'
) %>%
dplyr::select(cnpj_raiz, identificador_matriz_filial, uf, everything()) %>%
dplyr::group_by(cnpj_raiz, identificador_matriz_filial, uf) %>%
dplyr::mutate(
inclui_matriz_na_uf = all(1:2 %in% c(identificador_matriz_filial)),
inclui_matriz_na_uf = ifelse(inclui_matriz_na_uf == T, 'Sim', 'Não')
) %>%
dplyr::mutate(Ano = stringr::str_extract(Ano, "[0-9].*")) %>%
dplyr::filter(inclui_matriz_na_uf == 'Não')
Example of the basis:
structure(list(cnpj_raiz = c("6565656", "6565656", "77777","63547", "63547", "63547", "63547", "63547", "63547", "202102", "789654", "632145", "96896", "5555469", "5555469",
"5555469", "6665554", "123321", "2703001", "270307171", "2220258", "878787", "878787", "234209", "717400", "737410", "98598","99992225", "99992225", "99992225", "99992225", "99992225", "99992225", "99992225", "99992225", "99992225", "99992225", "99992225", "99992225",
"99992225", "99992225", "2252221", "96665123", "917800", "8920071", "9461800", "100553", "5571114", "7038010", "494001"), identificador_matriz_filial = c(1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L,1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), uf = c("PB", "PB", "RS", "PE", "PE", "PE", "PE", "PE", "PE", "SE", "BA", "AL", "BA", "PR", "PR", "PR", "RJ", "SP", "CE", "RJ", "SP", "BA", "SP","SP", "SP", "MT", "PE", "GO", "GO", "GO", "GO", "GO", "GO", "GO", "GO", "GO", "GO", "GO", "GO", "GO", "GO", "MA", "GO", "PE", "PE", "MG", "SP", "PI", "SP",
"RJ"), ativa_em_2017 = c(1L, 1L, NA, 1L, NA, 1L, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, 1L, NA, NA, 1L, NA, 1L, 1, NA, NA, 1L, 1L, NA, 1L, 1L, NA, 1L, NA, NA, 1L, 1L, NA, 1L, NA, 1L, NA, NA, NA, NA, 1L, NA, NA, NA, 1L, NA, 1L), ativa_em_2018 = c(1L, 1L, NA, 1L, NA, 1L, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, 1L, NA, NA, NA, NA, NA, 1, NA, NA, 1L, 1L, NA, 1L, 1L, NA, 1L, NA, 1L, 1L, 1L, NA, 1L, NA, 1L, NA, NA, NA, NA, 1L, NA, NA, NA, 1L, NA, NA), ativa_em_2019 = c(1L, 1L, NA,
1L, NA, 1L, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, 1L, NA, NA, NA, NA, NA, 1, NA, NA, 1L, 1L, NA, 1L, 1L, NA, 1L, NA, 1L, 1L, 1L, NA, 1L, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA), ativa_em_2020 = c(1L, 1L, NA, 1L, NA, 1L, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, 1L, NA, NA, NA, NA, 1, NA, NA, NA, 1L, 1L, NA, 1L, 1L, NA, 1L, NA, 1L, 1L, 1L, NA, 1L, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA)), row.names = c(NA, 50L), class = "data.frame")
Thank you! That’s almost it. The problem is this: all subsidiaries of the same company must count as 1 in the same state/year. Example: in GO despite having 8 subsidiaries, counts only as 1 company.
– RxT
Oops, I solved it! I added one
unique()
between thegroup_by()
and thesummarize()
. Thanks!– RxT