Webscrape Scoring for Welfare

Asked

Viewed 135 times

2

I needed to extract the information from this site for an excel file, which Members vote in favor, against, abstentions, finally. It’s a webscrape exc, but as I understand html I’m having a hard time understanding the nodes. I’ve tried to read_html, readHTMLTable, readLines, but none of these worked as desired.

Any of you have any suggestions?

http://infograficos.estadao.com.br/especiais/placar/votacao/economy/? id=Glwn7vxr3w

2 answers

3

To import the data on the Social Security Scoreboard, infographic of the Estadão site and export to Excel, use the code below.

If you have not installed the 'XML', 'xlsx' and 'stringr' packages, run the first line.

install.packages(c('XML', 'xlsx', 'stringr'))


library(XML)
library(stringr)
library(xlsx)

url <- 'http://infograficos.estadao.com.br/especiais/placar/votacao/economia/?id=GLwN7vXR3W'
paginavoto <- htmlParse(url)

tipo <- xpathSApply(paginavoto, "//section//h3", fun = xmlValue)
deputados <- data.frame(nome = character(), 
                    partido = character(), 
                    voto = character())

for(i in 1:length(tipo)){
  if(as.numeric(str_extract(tipo[i], '\\d+')) != 0){

    pDep <- paste0("//section[",i ,"]//span[@class='p-name']")
    pPart <- paste0("//section[",i ,"]//span[@class='p-org']")
    deputado <- data.frame(nome = xpathSApply(paginavoto, pDep, fun = xmlValue),
                   partido = xpathSApply(paginavoto, pPart, fun = xmlValue),
                   voto = trimws(str_extract(tipo[i], '\\D+')))
    deputados <- rbind(deputados, deputado)
  }
}

write.xlsx(deputados, "deputados.xlsx")

3


Using the packages stringr and rvest the question can be solved thus:

library(rvest)
library(stringr)
url <- 'http://infograficos.estadao.com.br/especiais/placar/votacao/economia/?id=GLwN7vXR3W'

resp <- read_html(url)

Since we will pick up texts several times, we should write a function:

pega_texto <- function (css) {
  resp %>% html_nodes(css) %>% html_text()
}

posicoes <- pega_texto('h3') %>% str_extract('[A-Z].+')

quantidades <- pega_texto('h3') %>% str_extract('[0-9]+') %>% as.numeric()

posicao <- mapply(rep, x =  posicoes, each = quantidades) %>% 
  unlist()

partido <- pega_texto('.p-org')
nome <- pega_texto('.p-name') %>% 
  .[. != "Placar da Previdência (intenção do voto)"]
regiao <- pega_texto('.p-region')

dados <- data.frame(partido, nome, regiao, posicao)

head(dados)

  partido           nome regiao posicao
1      PP Adail Carneiro     CE A favor
2    PMDB  Alberto Filho     MA A favor
3     PPS   Alex Manente     SP A favor
4    PMDB Altineu Côrtes     RJ A favor
5      PP    André Abdon     AP A favor
6     PSD André de Paula     PE A favor

openxlsx::write.xlsx(dados, "arquivo.xlsx)

EDITED

I had forgotten to comment on exporting to Excel. I recommend using the package openxlsx because it uses C++ to access Excel. The package xlsx uses Java and is common for Java incompatibility problems (32-bit X 64-bit).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.