Web Scrapping with R, java or html?

Question

Web Scrapping with R, java or html?

Asked 8 years, 7 months ago

Viewed 76 times

0

How do I extract the data table from the following page:http://www.ons.org.br/resultados_operacao/boletim_semanal/2016_12_16/ena.htm

3

I recommend that 1. You choose between R and java based on your experience and what you intend to do with the information, are very different languages. 2. Try to do on your own, and if you can’t post here your attempt for someone to help you understand what went wrong.

– Molx

2017/01/03 at 19:18
1

Check out this website https://github.com/dfalbel/ons. has webscrapping codes from various parts of the website

– Daniel Falbel

2017/01/03 at 19:48
if you choose java (since the question is wide), use java jsoup.

– Renan Gomes

2017/01/03 at 20:01

1 answer

Browser other questions tagged r

You are not signed in. Login or sign up in order to post.

by Luiz Rodrigo • **156** points · Answer 1 · 2017-01-03T22:39:43+00:00

Take a look at the package rvest.

The page you want was really built using bad practices, which makes the work a little difficult. By analyzing the page code, you can find that the actual content is at http://www.ons.org.br/resultados_operacao/boletim_semanal/2016_12_16/ena_arquivos/sheet001.htm

Then the following code captures the content of the page:

library(rvest)
tb = read_html("http://www.ons.org.br/resultados_operacao/boletim_semanal/2016_12_16/ena_arquivos/sheet001.htm") %>% 
  html_node("table") %>% 
  html_table(fill = TRUE)

Then you use subsetting to take only what matters, and put some proper names in the columns.

tb = tb[6:9, 2:4]
colnames(tb) = c("Região", "M/W Médios", "% MLT")