Merge into two worksheets format . csv in R

Question

Merge into two worksheets format . csv in R

Asked 6 years, 5 months ago

Viewed 90 times

4

I’m doing a job using the transparency portal, I need to join two databases prof1.csv and prof2.csv. The final result of merge, that I named prof.csv, is doubling rows due to columns 18 of gross salary and 19 of net salary. I would like the result equal to Prof.csv.. That is, I do not want to duplicate lines even if wages are different and I want to keep the wage values of different months in line. Follow a small part of the code I’m using.

url1 <- url("https://raw.githack.com/fsbmat/salarioDocente/master/prof1.csv")
url2 <- url("https://raw.githack.com/fsbmat/salarioDocente/master/prof2.csv")
prof1 <- read.csv2(url1, header = TRUE,encoding = "ASCII")
prof2 <- read.csv2(url2, header = TRUE,encoding = "ASCII")
Prof <- merge(prof1,prof2,by=c("ID_SERVIDOR_PORTAL"       ,"NOME"                     ,"CPF"                     ,
                             "DATA_INICIO_AFASTAMENTO"  ,"DATA_TERMINO_AFASTAMENTO",
                             "JORNADA_DE_TRABALHO"      ,"DATA_INGRESSO_ORGAO"      ,"UF_EXERCICIO"            ,
                             "Nivel"                    ,"LOTACAO"                  ,"REG_JURIDICO"             ,"VINCULO"                 ,
                             "CARGO"                    ,"Org_Exercicio"            ,"Tempo")
                ,all.x= T, all.y= T)

1 answer

Browser other questions tagged r dplyr merge tidyverse

You are not signed in. Login or sign up in order to post.

by Rafael Cunha • **4,954** points · Answer 1 · 2019-02-26T12:03:19+00:00

The merge is doubling the lines by the fact that there are different positions and levels between the 2 databases for the same person.

For example, in prof1, Fulano de Tal 1 possesses CARGO P3G and in prof2 your post is MS.

That said, you should remove these two variables from the argument by.

merge(prof1, prof2, by = c("ID_SERVIDOR_PORTAL", "NOME", "CPF", "DATA_INICIO_AFASTAMENTO",
                           "DATA_TERMINO_AFASTAMENTO", "JORNADA_DE_TRABALHO", "DATA_INGRESSO_ORGAO",
                           "UF_EXERCICIO", "LOTACAO", "REG_JURIDICO", "VINCULO", "Org_Exercicio", "Tempo"),
  all = T)  %>% 
  select(-contains(".x")) %>% 
  rename(Nivel = Nivel.y, CARGO = CARGO.y)

I included the commands select and rename to keep only the variables Nivel and CARGO base prof2