4
I’m trying to extract the information from the site of siconv dealing with covenants in R:
- https://www.convenios.gov.br/siconv/ForwardAction.do?modulo=programa&path=/ConsultarPrograma/ConsultarPrograma.do&Usr=guest&Pwd=guest
- http://portal.convenios.gov.br/acesso-livre
It turns out that when used in R, with the packages rvest and httr it redirects to the login screen located in https://idp.convenios.gov.br/idp/.
I tried to use something about Javascript that has in this post (https://www.r-bloggers.com/web-scraping-javascript-rendered-sites/) but unsuccessfully too.
I still don’t have much know-how on the subject of Web Scraping on R, but I can handle a few things. My idea with the capture in the siconv is to make a search per year of the available programs. But at first, I would have to have the same exit I have in navigating inside the R. The site mentioned above offers data to download, but my intention is to get some information that is not in the downloaded data.
I think the site uses cookies or something like that to certify that it is a computer and then grants access to the form. I need to at least get that form. If anyone has any ideas to give me about it.
Pablo, some websites store information about page segmentation in the URL, the simplest process of webscrapping. In others, the search information goes in another format, called POST. Take a look https://www.rdocumentation.org/packages/httr/versions/1.3.1/topics/POST
– Daniel Ikenaga
Got it Daniel, Thank you. That’s already a light at the end of the tunnel. Now, the challenge is to understand and apply this POST method and see if it will work.
– Pablo Dias Vieira
What data you need?
– Daniel Ikenaga
Well, that’s pretty mixed data. But the idea is to have an automated system to indicate if my state (RO) is able to receive one of those programs available in SICONV. When I access that site, I type 2018 and a list of programs appears and within them I want to rescue if RO is fit or not, it is in this sense. In the future I will fetch more data.
– Pablo Dias Vieira
Inspecting the code, for access to the form it is on this link: (javascript:window.Location=getPath()+'/Forwardaction.do?modulo=Principal&path=/Mostraprincipalconsultarprogram.do?Usr=guest&Pwd=guest'). I just don’t know what you mean.
– Pablo Dias Vieira
opens the browser’s network tab and sends the form. vc will see exactly what the browser does. from what I’ve seen around here is a POST sending the session form and cookie.
– Daniel Falbel
I was able to render the form page as in my browser, as well as in link . Now, next step will be to submit with the data I need and extract the information. In case for 2018 the programs.
– Pablo Dias Vieira