3
I’m doing an evaluation of the database of the transparency portal that can be obtained in this link, The problem is that I would like to select only one part of the database, my assessment is only about teacher data. I could do a data cleaning using Excel, but I would like to learn how to do in R. For reading the data I am using the following code:
library(readr)
df <- read_delim("~/GitHub/Servidores/Setembro/20160930_Cadastro.csv",
";", escape_double = FALSE, locale = locale(encoding = "ASCII"),
trim_ws = TRUE)
# As únicas colunas que importam são a 3ª (ID do servidor)
# e a 6ª (remuneração bruta) na planilha de remuneração
# Renomeando a coluna ID e de Remuneração básica bruta e
# fazendo um merge no data frame para acrescentar os salários
# de cada servidor
salarios <-
read_delim("~/GitHub/Servidores/Setembro/20160930_Remuneracao.csv", ";",
escape_double = FALSE, locale = locale(encoding = "ASCII"),
trim_ws = TRUE) %>% select(3, 6)
head(salarios)
names(salarios) <- c("ID_SERVIDOR_PORTAL", "SALARIO")
names(df) <- str_to_upper(names(df))
df <- merge(df, salarios, by="ID_SERVIDOR_PORTAL")
df$x <- 1
Having done this, I would like to know how to select a part of the database, only the part related to teachers, in order to study the database only for these.
I visited the link and could not find the files
20160930_Cadastro.csv
or20160930_Remuneracao.csv
. I just found a file called201609_GastosDiretos.csv
. Also, if your database has only two columns, one callID_SERVIDOR_PORTAL
and anotherSALARIO
, where would be the information about the server’s position? It’s onID_SERVIDOR_PORTAL
even?– Marcus Nunes
Hi @Marcusnunes, I don’t know how, but I got the link wrong! The same has been fixed! The registration database has 42 two columns of interest and I added two more that are in the remuneration database. Thank you and sorry for the mistake!
– fsbmat