Rvest is returning the following error when reading a page: {xml_nodeset (0)}, how to resolve?

Asked

Viewed 62 times

-1

I would like to create a tool to make web scraping on the website of the Chamber of Deputies of Rio de Janeiro, but I am running into the problem of even being able to read the web page.

Does anyone know why the function read_html is returning the value {xml_nodeset (0)}?

Follows the code:

scrap <- read_html("http://www.camara.rj.gov.br/controle_atividade_parlamentar.php?m1=materias_leg&m2=10a_Leg&m3=prolei&url=http://mail.camara.rj.gov.br/APL/Legislativos/scpro1720.nsf/Internet/LeiInt?OpenForm")

scrap %>%
  html_nodes("h1")


#Resposta: {xml_nodeset (0)}
  • This is not an error message, it is a function return value html_nodes which means not the value "h1" was not found.

1 answer

0

It was not clear in the question what you are looking for on the page, I believe the bills are.

Normally, tables in HTML use the class = table. Behind the scenes I think the function read_html looks for this table tag, if it does not find it, the search returns empty and the code fails.

The bills are in a iframe, which is a window in HTML within another page on HTML. So you need to first get the link from this iframe and then scraping the data.

This code will fetch the iframe for you:

library(tidyverse)
library(rvest)

html_url <- "http://www.camara.rj.gov.br/controle_atividade_parlamentar.php?m1=materias_leg&m2=10a_Leg&m3=prolei&url=http://mail.camara.rj.gov.br/APL/Legislativos/scpro1720.nsf/Internet/LeiInt?OpenForm"

html_iframe <- read_html(x = html_url) %>%
  html_nodes(css = "iframe") %>%
  html_attr("src")

html <- read_html(html_iframe[1])

Realize that the object html_iframe will contain the link and now using the function read_html will return an object with the data.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.