Aggregating data with different dates and considering other columns in R

Question

Aggregating data with different dates and considering other columns in R

Asked 6 years, 4 months ago

Viewed 53 times

0

I would like to aggregate the lines of dataframe TBCG2, when the DATA_INGRESSO_ORGAO is different (see column ID_SERVIDOR_PORTAL numbers 977, 1089, 1365, 1666, 2597, 2779 and 3036). I want to keep the oldest date, as code below. However, in the case of ID 2789, have CARGOs different for different dates, in this case, I want to keep the two lines, modifying the ID of one of them adding an x to the ID. I mean, I want to keep one ID_SERVIDOR_PORTAL=2789 and another ID_SERVIDOR_PORTAL=2789x. This dataframe is only a part of my database. How should I proceed?

url=url("https://raw.githack.com/fsbmat/salarioDocente/master/Teste/TBCG2.csv")
TBCG2 <- read.csv2(url, header = TRUE,encoding = "ASCII")
TBCG2$DATA_INGRESSO_ORGAO <- as.Date(as.character(TBCG2$DATA_INGRESSO_ORGAO), format = "%d/%m/%Y")
library(sqldf)
TBCG2 <- sqldf('select ID_SERVIDOR_PORTAL,NOME,CPF,CARGO,
                min(DATA_INGRESSO_ORGAO) as DATA_INGRESSO_ORGAO,
                sum(BRU_Jan2013 )   as  BRU_Jan2013,        
                sum(BRU_Fev2013 )   as  BRU_Fev2013,         
                sum(BRU_Mar2013 )   as  BRU_Mar2013
                from TBCG2 
                group by ID_SERVIDOR_PORTAL,NOME,CPF')

1 answer

Browser other questions tagged r

You are not signed in. Login or sign up in order to post.

by fsbmat • **1,291** points · Answer 1 · 2019-03-06T00:06:34+00:00

Apparently I got a solution, maybe not the fastest due to the loop, but the important thing is that it worked. Follow the code:

url=url("https://raw.githack.com/fsbmat/salarioDocente/master/Teste/TBCG2.csv")
TBCG2 <- read.csv2(url, header = TRUE,encoding = "ASCII")
TBCG2$DATA_INGRESSO_ORGAO <- as.Date(as.character(TBCG2$DATA_INGRESSO_ORGAO), format = "%d/%m/%Y")
a <- c(NULL)
b <- c(NULL)
df <- TBCG2[duplicated(TBCG2$ID_SERVIDOR_PORTAL),]
ID <- df$ID_SERVIDOR_PORTAL
for (i in 1:length(ID)) {
  a[i] <- min((1:nrow(TBCG2))[TBCG2$ID_SERVIDOR_PORTAL==ID[i]])
  b[i] <- max((1:nrow(TBCG2))[TBCG2$ID_SERVIDOR_PORTAL==ID[i]])
  TBCG2$ID_SERVIDOR_PORTAL[a[i]] <- ifelse(TBCG2$ID_SERVIDOR_PORTAL[a[i]]==TBCG2$ID_SERVIDOR_PORTAL[b[i]]&TBCG2$CARGO[a[i]]==TBCG2$CARGO[b[i]],TBCG2$ID_SERVIDOR_PORTAL[a[i]],as.numeric(paste(TBCG2$ID_SERVIDOR_PORTAL[a[i]],"001",sep="")))
}
library(sqldf)
TBCG2 <- sqldf('select ID_SERVIDOR_PORTAL,NOME,CPF,CARGO,
                min(DATA_INGRESSO_ORGAO) as DATA_INGRESSO_ORGAO,
                sum(BRU_Jan2013 )   as  BRU_Jan2013,        
                sum(BRU_Fev2013 )   as  BRU_Fev2013,         
                sum(BRU_Mar2013 )   as  BRU_Mar2013
                from TBCG2 
                group by ID_SERVIDOR_PORTAL,NOME,CPF')