Web Scrapping with R, java or html?

Asked

Viewed 76 times

0

  • 3

    I recommend that 1. You choose between R and java based on your experience and what you intend to do with the information, are very different languages. 2. Try to do on your own, and if you can’t post here your attempt for someone to help you understand what went wrong.

  • 1

    Check out this website https://github.com/dfalbel/ons. has webscrapping codes from various parts of the website

  • if you choose java (since the question is wide), use java jsoup.

1 answer

2


Take a look at the package rvest.

The page you want was really built using bad practices, which makes the work a little difficult. By analyzing the page code, you can find that the actual content is at http://www.ons.org.br/resultados_operacao/boletim_semanal/2016_12_16/ena_arquivos/sheet001.htm

Then the following code captures the content of the page:

library(rvest)
tb = read_html("http://www.ons.org.br/resultados_operacao/boletim_semanal/2016_12_16/ena_arquivos/sheet001.htm") %>% 
  html_node("table") %>% 
  html_table(fill = TRUE)

Then you use subsetting to take only what matters, and put some proper names in the columns.

tb = tb[6:9, 2:4]
colnames(tb) = c("Região", "M/W Médios", "% MLT")

Browser other questions tagged

You are not signed in. Login or sign up in order to post.