It is easy to notice that there is no error in your code. To see this, just replace your url with a url that has a well formatted table. I use this as an example website. Behold:
library(tidyverse)
library(rvest)
library(bizdays)
library(dplyr)
library(lubridate)
japan<- "http://www.data.jma.go.jp/obd/stats/etrn/view/monthly_s3_en.php?block_no=47401&view=1"
html <- read_html(japan)
dados_japan <- html %>%
html_nodes(xpath = "//table//td") %>%
html_text() %>%
str_replace(",", ".") %>%
as.numeric() %>%
matrix(ncol = 14, byrow = TRUE) %>%
as.data.frame() %>% setNames(c('Year','Jan','Feb','Mar','Apr',
'May','Jun','Jul','Aug','Sep',
'Oct','Nov','Dec','Annual'))
head(dados_japan)
Output:
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Annual
1 NA NA 1938 -5.2 -4.9 -0.6 4.7 9.5 11.6 17.9 22.2 16.5 10.7 3.3
2 -4.7 6.8 1939 -7.5 -6.6 -1.4 4.0 7.5 13.0 17.4 20.0 17.4 9.7 3.0
3 -2.5 6.2 1940 -6.0 -5.7 -0.5 3.5 8.5 11.0 16.6 19.7 15.6 10.4 3.7
4 -1.0 6.3 1941 -6.5 -5.8 -2.6 3.6 8.1 11.4 12.7 16.5 16.0 10.0 4.0
5 -2.9 5.4 1942 -7.8 -8.2 -0.8 3.5 7.1 12.0 17.4 18.4 15.7 10.5 2.5
6 -2.9 5.6 1943 -4.1 -6.1 -1.1 3.5 6.9 12.9 19.3 21.5 17.5 11.7 1.2
If the code is right, why the failure to perform the same procedure on the Bovespa website? I believe this is due to the formatting in the source code of the page. Note that if you open the source code of the page of the Japanese Weather Agency used in the example (Crtl+U in google Chrome or BD->View page source), click Ctrl+F and look for the term "table" you will notice that the content of the table of interest is wrapped in a tag "table class". However, when you perform the same procedure on the Ibovespa page there is no tag equivalent. Behind the scenes I believe the function read_html
looking for that tag table, if it does not find, the search returns empty and the code fails.
What to do?
Well, in that case you will need to manually find the table and only then use the function read_html
with the correct URL inputs and xpath
. For this just follow the following steps:
- Click with the BD on top of the table
- View frame source
- Ctrl+F "table". She will be there
- Copy the url (without "view-source:") and replace in your code
- Go back to the original page
- Take the
xpath
table (BD->Inspect, selects element,
BD->copy->xpath)
- Put this on
xpath
in the argument of html_nodes
The code will look like this:
bovespa <- 'http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-taxas-referenciais-bmf-ptBR.asp'
html <- read_html(bovespa)
dados_bov <- html %>%
html_nodes(xpath = '//*[@id="tb_principal1"]')%>% #aqui você já tem a tabela, daqui para baixo o código apenas padroniza a tabela que vem desestruturada
html_text()%>%
str_replace(",", ".") %>%
str_replace_all('(\r|\n){1,}',';')%>%
dados_bov=strsplit(dados_bov,';')[[1]][5:892]%>% #transforma a string em uma lista e elimina o cabeçalho da base
matrix(ncol = 3, byrow = TRUE)%>%
as.data.frame()%>%
setNames(c("Dias Coriidos", "DI_pre_252", "DI_pre_360"))
head(dados_bov)
Output:
Dias Coriidos DI_pre_252 DI_pre_360
1 3 4.15 0,00
2 7 4,01 4,10
3 10 4,01 3,43
4 12 4,00 3,81
5 13 4,00 3,95
6 14 4,00 4,08
BD: right click
I’m guessing the B3 website does not allow you to download directly via webscrap. I tried several ways (including with the "rvest" package) and always returns empty. I recommend exploring and trying to download information related to B3 by the "Batchgetsymbols" package. Or copy the table to Excel and read by R.
– Guilherme Marques
Thanks for the tips!
– Alexandre Sanches
@Alexandresanches consider editing the question title to make it more general. Your problem is not limited to the stock exchange site, it occurs on several Urls
– Lucas