1
I’m importing data from the public database:
with the following code:
library(rvest)
for (i in 1:8){
url_BF <- sprintf('http://bolsafamilia.datasus.gov.br/w3c/consol_uf_cobertura_bfa.asp?gru=5&vigencia=%02d&vigatual=N&uf=%02s®ional=00®iaosaude=00&cob=1&brsm=1', 2*i+18,"SC")
html_BF <- url_BF %>%
httr::GET() %>%
httr::content('text', encoding = 'latin1') %>%
xml2::read_html() %>%
rvest::html_nodes('table')
data_BF <- html_BF[3] %>%
html_table(header = TRUE, fill = TRUE)
data_BF <- as.data.frame(data_BF)
if(i >= 1 & i < 5){
linhas = c(294,295)
}else if(i >= 5){
linhas = c(296,297)
}
data_BF <- data_BF[-linhas,]
data_BF <- data_BF %>% select(c(1,2,3,4))
data_BF <- data_BF %>% mutate(ANO = 2009+i)
if(i == 1){
data_BF_ano <- data_BF
}else if(i > 1){
data_BF_ano <- data_BF_ano %>% full_join(data_BF)
}
}
After extracting the table of interest and transforming it into data.frame
the R is understanding the numerical values "1,000" as decimals, however the point indicates thousands according to the database.
How do I make him understand that this is a thousand or how I can remove the point without altering the data structure?
Maybe
html_table(etc, dec = ",")
and thengsub("\\.", "", coluna_problema)
followed by conversion withas.numeric
.– Rui Barradas
I changed the following line in the code:
data_BF <- html_BF[3] %>% html_table(header = TRUE, fill = TRUE, dec = ",")
. And after thedata_BF <- as.data.frame(data_BF)
includes the following two lines:data_BF$Famílias.para.Acompanhamento <- as.numeric(gsub("\\.","",data_BF$Famílias.para.Acompanhamento))
anddata_BF$Famílias.Acompanhadas <- as.numeric(gsub("\\.","",data_BF$Famílias.Acompanhadas))
Worked. Thanks @Noisy– Daniel Travessini