How to identify if one observation is in the same group as another?

Asked

Viewed 40 times

0

I have a date.frame with the columns: cnpj_root, id_m_f, municipio and Uf

I need to check the amount of cnpj_root that exists in each municipality and check if the matrix (1) is in the same municipality of some branch (2).

What I’ve done so far:

library(tidyverse)

empresas %>% 
             group_by(cnpj_raiz, id_m_f, municipio) %>% 
             count() 

Upshot:

 cnpj_raiz id_m_f municipio                    n

 1111111   1        PITIMBU                    1
 1111111   2        ALHANDRA                   3
 22222222  1        CARUARU                    1
 22222222  2        BREJO DA MADRE DE DEUS     4
 33333333  1        SERRINHA                   1
 44444444  1        ARAPIRACA                  1
 55555555  1        FEIRA DE SANTANA           1
 66666666  1        GUARAPUAVA                 1
 66666666  2        GUARAPUAVA                 2
 66666666  2        MOGI DAS CRUZES            3

For example, the cnpj_raiz==66666666, possesses id_m_f 1 and 2 in the city of Guarapuava. How to make this identification?

I thought of a result similar to:

 cnpj_raiz  municipio                   n    inclui_matriz

 1111111     PITIMBU                    1                F
 1111111     ALHANDRA                   3                F  
 22222222    CARUARU                    1                F
 22222222    BREJO DA MADRE DE DEUS     4                F
 33333333    SERRINHA                   1                F
 44444444    ARAPIRACA                  1                F
 55555555    FEIRA DE SANTANA           1                F
 66666666    GUARAPUAVA                 3                T
 66666666    MOGI DAS CRUZES            3                F

1 answer

1


There are several ways to arrive at the expected result. Here I present how to do this with the package job dplyr.

Quantity of cnpj per municipality:

library(dplyr)

df <- empresas %>% 
  dplyr::mutate(cnpj_raiz = as.factor(cnpj_raiz),
                municipio = as.factor(municipio)) %>% 
  dplyr::group_by(municipio, cnpj_raiz) %>% 
  dplyr::count(municipio) %>% 
  dplyr::ungroup() %>% 
  dplyr::group_by(municipio) %>% 
  dplyr::summarise(n = n()) %>%
  dplyr::rename("qtd_cnpj" = n)

Output df .

> df
# A tibble: 9 x 2
  municipio              qtd_cnpj
  <fct>                     <int>
1 ALHANDRA                      1
2 ARAPIRACA                     1
3 BREJO DA MADRE DE DEUS        1
4 CARUARU                       1
5 FEIRA DE SANTANA              1
6 GUARAPUAVA                    2
7 MOGI DAS CRUZES               1
8 PITIMBU                       1
9 SERRINHA                      1

Companies (cnpj) with head office and subsidiary in the same municipality.

mf <- empresas %>% 
  dplyr::group_by(municipio) %>% 
  dplyr::filter(n() > 1) %>% 
  dplyr::group_by(cnpj_raiz, municipio)  %>% 
  dplyr::count(cnpj_raiz) %>% 
  dplyr::ungroup() %>% 
  dplyr::mutate(check = ifelse(n > 1, "sim", NA))

Output mf.

> mf
# A tibble: 2 x 4
  cnpj_raiz municipio      n check
      <dbl> <fct>      <int> <chr>
1         6 GUARAPUAVA     2 sim  
2         7 GUARAPUAVA     2 sim 

Here are "your" data. Note that I increased the amount of municipalities in GUARAPUAVA to ratify the code.

cnpj_raiz <- c(1, 1, 2, 2, 3, 4, 5, 6, 6, 6, 7, 7)
id_m_f <- c(1, 2, 1, 2, 1, 1, 1, 1, 2, 2, 1, 2)
municipio <- c("PITIMBU", "ALHANDRA", "CARUARU", "BREJO DA MADRE DE DEUS",
               "SERRINHA", "ARAPIRACA", "FEIRA DE SANTANA", "GUARAPUAVA",
               "GUARAPUAVA", "MOGI DAS CRUZES", "GUARAPUAVA", "GUARAPUAVA")
n <- c(1, 3, 1, 4, 1, 1, 1, 1, 2, 3, 1, 3)
empresas <- data.frame(cnpj_raiz = cnpj_raiz, id_m_f = id_m_f,
                       municipio = municipio, n_outro = n) 
# e interessante mudar o nome da variavel `n`
  • Thanks for the reply! I believe I am not able to explain well. Missed to check if the cnpj_raiz has matrix ( id_m_f==1) in the same municipality with a subsidiary (id_m_f==2). Remembering that each cnpj_raiz has, necessarily, a head office and can have 0, 1 or several branches. For your example, if there are two branches in the same municipality, it can return check==sim, right? It does not check whether it is affiliate or matrix summed in n(). Once again, thank you for your attention!

  • 1

    In id_m_f I understood that 1 represents matrix (M). and 2the subsidiary (F). As you want to see if M and F are in the same city, the first step I did was to check the city duplicity, independent of cnpj or M and F. So, as we do not have two M p/ a same cnpj, so if there is the same CNPJ p/ a municipality, it is understood that are M and F, considering what you want to analyze. The check was an extra step and can be deleted.

  • 1

    I get it, it’s true. Thank you!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.