Web Scrapping R

Question

Web Scrapping R

Asked 4 years, 11 months ago

Viewed 51 times

1

I tried several ways but I can’t make Scrapping from the following table:

http://www2.bmf.com.br/pages/portal/bmfbovespa/boletim1/TxRef1.asp.

Until now through the following code:

   library("rvest")

URL <-'http://www2.bmf.com.br/pages/portal/bmfbovespa/boletim1/TxRef1.asp'
    
    
    bfm.tx <- URL %>%
      xml2::read_html() %>%
      html_nodes(xpath = '//*[@id="tb_principal1"]') %>%
      html_table()
    tx.df <- bfm.tx [[1]]

However no information is returned. I tried to use excel, but the button calls a function inside the site.

1 answer

Browser other questions tagged r web-scraping

You are not signed in. Login or sign up in order to post.

by lmonferrari • **3,550** points · Answer 1 · 2020-08-21T13:50:14+00:00

Try to select using: html_node('.tabConteudo')

library(rvest) 
url <- 'http://www2.bmf.com.br/pages/portal/bmfbovespa/boletim1/TxRef1.asp'
pagina <- read_html(url)
pagina

a <- pagina %>% 
  html_node('.tabConteudo') %>%
  html_nodes('td') %>%
  html_text()

library(stringr)

a <- a[5:length(a)]

b <- str_split(a, '"',simplify = T)
b <- matrix(b, ncol = 3, byrow = T )

There must be a simpler way to do this but in short I Filtrei using html_nodes, then I removed the headers a[5:length(a)], then converted into a Matrix str_split(a, '"',simplify = T), and soon after I defined the dimensions of the Matrix matrix(b, ncol = 3, byrow = T ) . I believe that you can transform into a data.frame and work more efficiently.