Webscraping - Site with choices

Asked

Viewed 77 times

4

I have a more general question (although this site is more suitable for more specific things), and I would appreciate it if someone could help with some tips on where to start.

It is possible to scrap on a site that has boxes of options, like this?

https://filia-consulta.tse.jus.br/#/main/download inserir a descrição da imagem aqui

I have basic knowledge of rvest and I know how to take simple data like this:

partidos <- "http://www.tse.jus.br/partidos/partidos-politicos/registrados-no-tse"

partidos <- partidos %>%
read_html() %>%
html_table() %>%
.[[1]]

But, I have no idea how to scrape the data on links as indicated above. The problem is that I can’t find the link(s) from which the data that appears for download is stored by clicking on "query". Does anyone have any tips or suggests any material for me to research?

1 answer

3


A starting point for knowing how any website works are the developer tools available in the browsers. In this case, I used Chrome’s Network (Network) tab to check where the file was being downloaded from:

Aba network das Ferramentas de Desnvolvedor exibindo a requisição para o arquivo baixado

Then I realized that when selecting the PT of Espírito Santo, for example, the browser made a request for the URL:

http://agencia.tse.jus.br/estatistica/sead/eleitorado/filiados/uf/filiados_pt_es.zip

You might notice a pattern there. But I argue that, more important than solving your specific case, is to learn how to use the development tools available in the browsers themselves, for this I recommend the materials below:

https://developer.mozilla.org/en-US/docs/Tools/Network_Monitor https://developers.google.com/web/tools/chrome-devtools

In the future, when you want to deal with websites like this, you can use tools like Selenium to automate operations:

These packages are not enough to access all kinds of content web. A clear example of this are pages where the content is produced by javascript, which happens on many modern websites. To work with these sites, it is really necessary to "simulate" a browser that accesses the web page. One of the best tools for this is the Lenium. We will not discuss Lenium in this course, but case want to deepen, access here. 2

  • 1

    Thank you very much, felubra. Now I managed to find the links. And great tips, I will research them in time. For now, now I will be able to download the data on the site of Tse. Super worth!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.