Select a part of the database in R

Question

Select a part of the database in R

Asked 8 years, 9 months ago

Viewed 2,281 times

3

I’m doing an evaluation of the database of the transparency portal that can be obtained in this link, The problem is that I would like to select only one part of the database, my assessment is only about teacher data. I could do a data cleaning using Excel, but I would like to learn how to do in R. For reading the data I am using the following code:

library(readr)

df <- read_delim("~/GitHub/Servidores/Setembro/20160930_Cadastro.csv", 
";", escape_double = FALSE, locale = locale(encoding = "ASCII"),
trim_ws = TRUE)

# As únicas colunas que importam são a 3ª (ID do servidor) 
# e a 6ª (remuneração bruta) na planilha de remuneração      

# Renomeando a coluna ID e de Remuneração básica bruta e 
# fazendo um merge no data frame para acrescentar os salários 
# de cada servidor

salarios <-        
read_delim("~/GitHub/Servidores/Setembro/20160930_Remuneracao.csv", ";",
escape_double = FALSE, locale = locale(encoding = "ASCII"),
trim_ws = TRUE) %>% select(3, 6) 
head(salarios)

names(salarios) <- c("ID_SERVIDOR_PORTAL", "SALARIO")

names(df) <- str_to_upper(names(df))
df <- merge(df, salarios, by="ID_SERVIDOR_PORTAL")
df$x <- 1

Having done this, I would like to know how to select a part of the database, only the part related to teachers, in order to study the database only for these.

I visited the link and could not find the files 20160930_Cadastro.csv or 20160930_Remuneracao.csv. I just found a file called 201609_GastosDiretos.csv. Also, if your database has only two columns, one call ID_SERVIDOR_PORTAL and another SALARIO, where would be the information about the server’s position? It’s on ID_SERVIDOR_PORTAL even?

– Marcus Nunes

2016/10/28 at 19:15
Hi @Marcusnunes, I don’t know how, but I got the link wrong! The same has been fixed! The registration database has 42 two columns of interest and I added two more that are in the remuneration database. Thank you and sorry for the mistake!

– fsbmat

2016/10/28 at 19:22

1 answer

Browser other questions tagged database r

You are not signed in. Login or sign up in order to post.

by Marcus Nunes • **17,915** points · Answer 1 · 2016-10-28T20:05:53+00:00

I could not read the data with your original commands. I changed them so that my computer could work. If you can read these files with your original commands, ignore this part of my code.

setwd("~/GitHub/Servidores/Setembro/")

library(readr)
library(stringr)

cadastro <- read.table(file="20160930_Cadastro.csv", header=TRUE, sep="\t")

df <- read_delim("20160930_Cadastro.csv", "\t", escape_double=FALSE,
locale = locale(encoding = "Latin1"), trim_ws = TRUE)

# As únicas colunas que importam são a 3ª (ID do servidor) 
# e a 6ª (remuneração bruta) na planilha de remuneração      

# Renomeando a coluna ID e de Remuneração básica bruta e 
# fazendo um merge no data frame para acrescentar os salários 
# de cada servidor

salarios <- read_delim("20160930_Remuneracao.csv", "\t", escape_double = FALSE,
locale = locale(encoding = "Latin1"), trim_ws = TRUE) %>% select(3, 6) 

names(salarios) <- c("ID_SERVIDOR_PORTAL", "SALARIO")

names(df) <- str_to_upper(names(df))
df <- merge(df, salarios, by="ID_SERVIDOR_PORTAL")
df$x <- 1

# selecionar as posicoes no banco de dados df
# que possuem a string 'PROFESSOR' em algum lugar
# (talvez precise refinar isto dependendo
# do objetivo deste trabalho)

professores <- grep("PROFESSOR", df$DESCRICAO_CARGO)

# novo banco de dados apenas com as linhas dos 
# professores (ou melhor, dos servidores cuja
# descricao do cargo possui 'PROFESSOR' em algum 
# momento)

df.professores <- df[professores, ]