Web Scraping on R

Asked

Viewed 149 times

2

I have to download the table of this link: http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-taxas-referenciais-bmf-ptBR.asp

I’m trying to use the package rvest, however, to no avail.

library('rvest')
url <- 'http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-taxas- 
referenciais-bmf-ptBR.asp'

site <- read_html(url)
info_Ajuste_HTML <- html_nodes(site,'table')

info_Ajuste <- html_text(info_Ajuste_HTML)

lista_tabela <- site 
lista_tabela <- html_nodes(site, xpath = "//td") 
lista_tabela <- html_table(site, fill = TRUE)

dados <- lista_tabela[[1]]

1 answer

4


I believe the following answers the question. The problem is that table extraction is not automated at all, you need to know how many columns the table has.

library(tidyverse)
library('rvest')

url <- 'http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-taxas-referenciais-bmf-ptBR.asp'

HTML <- read_html(url) 

dados <- HTML %>%
  html_nodes(xpath = "//table//td") %>%
  html_text() %>%
  str_replace(",", ".") %>%
  as.numeric() %>%
  matrix(ncol = 3, byrow = TRUE) %>%
  as.data.frame()

str(dados)
#'data.frame':  294 obs. of  3 variables:
# $ V1: num  1 6 7 11 14 21 22 25 32 33 ...
# $ V2: num  4.9 4.9 4.9 4.9 4.9 4.9 4.9 4.9 4.84 4.83 ...
# $ V3: num  0 4.66 5 3.8 4.49 4.66 4.77 4.47 4.53 4.59 ...

Browser other questions tagged

You are not signed in. Login or sign up in order to post.