0
I would like to aggregate the lines of dataframe
TBCG2
, when the DATA_INGRESSO_ORGAO
is different (see column ID_SERVIDOR_PORTAL
numbers 977, 1089, 1365, 1666, 2597, 2779
and 3036
). I want to keep the oldest date, as code below. However, in the case of ID 2789
, have CARGOs
different for different dates, in this case, I want to keep the two lines, modifying the ID
of one of them adding an x to the ID
. I mean, I want to keep one ID_SERVIDOR_PORTAL=2789
and another ID_SERVIDOR_PORTAL=2789x
. This dataframe is only a part of my database. How should I proceed?
url=url("https://raw.githack.com/fsbmat/salarioDocente/master/Teste/TBCG2.csv")
TBCG2 <- read.csv2(url, header = TRUE,encoding = "ASCII")
TBCG2$DATA_INGRESSO_ORGAO <- as.Date(as.character(TBCG2$DATA_INGRESSO_ORGAO), format = "%d/%m/%Y")
library(sqldf)
TBCG2 <- sqldf('select ID_SERVIDOR_PORTAL,NOME,CPF,CARGO,
min(DATA_INGRESSO_ORGAO) as DATA_INGRESSO_ORGAO,
sum(BRU_Jan2013 ) as BRU_Jan2013,
sum(BRU_Fev2013 ) as BRU_Fev2013,
sum(BRU_Mar2013 ) as BRU_Mar2013
from TBCG2
group by ID_SERVIDOR_PORTAL,NOME,CPF')