Merge into two worksheets format . csv in R

Asked

Viewed 90 times

4

I’m doing a job using the transparency portal, I need to join two databases prof1.csv and prof2.csv. The final result of merge, that I named prof.csv, is doubling rows due to columns 18 of gross salary and 19 of net salary. I would like the result equal to Prof.csv.. That is, I do not want to duplicate lines even if wages are different and I want to keep the wage values of different months in line. Follow a small part of the code I’m using.

url1 <- url("https://raw.githack.com/fsbmat/salarioDocente/master/prof1.csv")
url2 <- url("https://raw.githack.com/fsbmat/salarioDocente/master/prof2.csv")
prof1 <- read.csv2(url1, header = TRUE,encoding = "ASCII")
prof2 <- read.csv2(url2, header = TRUE,encoding = "ASCII")
Prof <- merge(prof1,prof2,by=c("ID_SERVIDOR_PORTAL"       ,"NOME"                     ,"CPF"                     ,
                             "DATA_INICIO_AFASTAMENTO"  ,"DATA_TERMINO_AFASTAMENTO",
                             "JORNADA_DE_TRABALHO"      ,"DATA_INGRESSO_ORGAO"      ,"UF_EXERCICIO"            ,
                             "Nivel"                    ,"LOTACAO"                  ,"REG_JURIDICO"             ,"VINCULO"                 ,
                             "CARGO"                    ,"Org_Exercicio"            ,"Tempo")
                ,all.x= T, all.y= T)

1 answer

3


The merge is doubling the lines by the fact that there are different positions and levels between the 2 databases for the same person.

For example, in prof1, Fulano de Tal 1 possesses CARGO P3G and in prof2 your post is MS.

That said, you should remove these two variables from the argument by.

merge(prof1, prof2, by = c("ID_SERVIDOR_PORTAL", "NOME", "CPF", "DATA_INICIO_AFASTAMENTO",
                           "DATA_TERMINO_AFASTAMENTO", "JORNADA_DE_TRABALHO", "DATA_INGRESSO_ORGAO",
                           "UF_EXERCICIO", "LOTACAO", "REG_JURIDICO", "VINCULO", "Org_Exercicio", "Tempo"),
  all = T)  %>% 
  select(-contains(".x")) %>% 
  rename(Nivel = Nivel.y, CARGO = CARGO.y)

I included the commands select and rename to keep only the variables Nivel and CARGO base prof2

  • Hi Rafael, I had not noticed this. It has to keep the columns Nível and CARGO with the spreadsheet values prof2.csv using the function merge? In that case, Nível.y and CARGO.y are the answers that interest me!

  • 1

    The variables with end .x are of prof1 and the final .y are that of prof2.

  • Yes, I understand that, but the spreadsheets I work with are huge, so I want to know the simplest way to always keep the values of prof2. With new codes I can do this, for example: Prof$Nivel <- Prof$Nivel. y Prof$Cargo <- Prof$CARGO. y Prof <- Prof %>% select(1:13,16:21,24:25), but if it were possible to keep this information already in the first code it would be better!

  • 1

    I added two lines of code, see if it fits what you want

Browser other questions tagged

You are not signed in. Login or sign up in order to post.