Compare the information of two data.frames (tables) to create groups and a third column. in R

Asked

Viewed 69 times

1

I am trying to create differentiated groups of jobs from the CBO database (BRAZILIAN OCCUPATION CLASSIFICATION) They provide a CBO2002.csv with two columns the column "CODE" (with the code of each profession) and the column "PROFESSION" (with the name of each occupation) in order.

ex: from line 1 to 28 are the military professionals. from line 29 to 44 are the politicians and so on

I have a table (PROF) with two columns, the first with a sequence numbering of 1:3108, and the second column with the professions. The idea is to create professional groups and group identify these customers in these groups.

for example: table professionals (PROF) that fit the occupations of lines 1 to 28 of the table (CBO2002) enter the MILITARY group creating a third column in the table (PROF) with the name of "GROUPS" with these professions identified as "MILITARY"

    PRO2F <- PROF %>% 
dplyr::mutate(GRUPOS = dplyr::case_when 

( PROFISSAO == "OFICIAL GENERAL DA AERONAUTICA" ~ "MILITAR",
  PROFISSAO == "OFICIAL GENERAL DO EXERCITO" ~ "MILITAR",
))

I managed this way without using the table (CBO2002) filter, but I had to write each profession and are more than 3000.

My question is how I can use the table ranges (CBO2002) as a filter to identify and group the professions in my table (PROF) between the line. So you don’t have to write 3000 professions.

What I tried and didn’t work so far:

PROF2 <- PROF %>% 
    dplyr::mutate(GRUPOS = dplyr::case_when 
    
    ( PROFISSAO == CBO2002[c(1:28),"PROFISSAO"] ~ "MILITAR" ))  

or

militar<-as.vector(CBO2002[c(1:28),"PROFISSAO"])

PROF2 <- PROF %>% 
    dplyr::mutate(GRUPOS = dplyr::case_when 
    
    ( PROFISSAO == militar ~ "MILITAR" ))  

I’m new and thank you for your attention.

CBO2002.csv

PROF.csv

1 answer

1

You can use the between by comparing the data frame indexes in the case:

library(dplyr)

rows <- rownames(CBO2002)
CBO2002 <- CBO2002 %>%
  mutate(grupo = case_when(
    between(rows,1,28) ~ 'Militar',
    between(rows,29,44) ~ 'Político',
    TRUE ~ 'outros'
  ))

Then rename the column and Join the variable name

CBO2002 <- rename(CBO2002, profissao = TITULO)
inner_join(PROF, CBO2002, by='profissao')
  • Thank you very much for your reply.

  • Thank you so much for the answer. I need to create the third column "GROUP" in the table (PROF). The idea is to use the table interval names (CBO2002) as a filter for creating groups,(professions 1:28 = Military; professions 29:44 = Politicos,...) compare with the table professions (PROF) and apply the result in the "GROUP" column created in the table (PROF) in the same row as the profession in the column "PROFESSIONAL"

  • @Lucasmoterani, good morning! I updated the answer, after a look. Hug!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.