Does anyone know how to make a Web Scraping on the SICONV (Free Access) website - With R?

Asked

Viewed 214 times

4

I’m trying to extract the information from the site of siconv dealing with covenants in R:

It turns out that when used in R, with the packages rvest and httr it redirects to the login screen located in https://idp.convenios.gov.br/idp/.

I tried to use something about Javascript that has in this post (https://www.r-bloggers.com/web-scraping-javascript-rendered-sites/) but unsuccessfully too.

I still don’t have much know-how on the subject of Web Scraping on R, but I can handle a few things. My idea with the capture in the siconv is to make a search per year of the available programs. But at first, I would have to have the same exit I have in navigating inside the R. The site mentioned above offers data to download, but my intention is to get some information that is not in the downloaded data.

I think the site uses cookies or something like that to certify that it is a computer and then grants access to the form. I need to at least get that form. If anyone has any ideas to give me about it.

  • 2

    Pablo, some websites store information about page segmentation in the URL, the simplest process of webscrapping. In others, the search information goes in another format, called POST. Take a look https://www.rdocumentation.org/packages/httr/versions/1.3.1/topics/POST

  • Got it Daniel, Thank you. That’s already a light at the end of the tunnel. Now, the challenge is to understand and apply this POST method and see if it will work.

  • What data you need?

  • Well, that’s pretty mixed data. But the idea is to have an automated system to indicate if my state (RO) is able to receive one of those programs available in SICONV. When I access that site, I type 2018 and a list of programs appears and within them I want to rescue if RO is fit or not, it is in this sense. In the future I will fetch more data.

  • Inspecting the code, for access to the form it is on this link: (javascript:window.Location=getPath()+'/Forwardaction.do?modulo=Principal&path=/Mostraprincipalconsultarprogram.do?Usr=guest&Pwd=guest'). I just don’t know what you mean.

  • 1

    opens the browser’s network tab and sends the form. vc will see exactly what the browser does. from what I’ve seen around here is a POST sending the session form and cookie.

  • I was able to render the form page as in my browser, as well as in link . Now, next step will be to submit with the data I need and extract the information. In case for 2018 the programs.

Show 2 more comments
No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.