Scrape of MTE mediating system

Question

Scrape of MTE mediating system

Asked 8 years, 10 months ago

Viewed 230 times

4

I’m trying to make the Scrape of the Ministry of Labor mediating system. Basically, I want the relationship of collective agreements and conventions:

url1<-"http://www3.mte.gov.br/sistemas/mediador/ConsultarInstColetivo"

Once I access this page, I arrive at the search form. I chose only to select the term: "All" and the registration UF: "IF"

By clicking, I have access to XHR:

url2<-"http://www3.mte.gov.br/sistemas/mediador/ConsultarInstColetivo/getConsultaAvancada"

And the body:

str(body)
List of 27
 $ nrCnpj                             : chr ""
 $ nrCei                              : chr ""
 $ noRazaoSocial                      : chr ""
 $ dsCategoria                        : chr ""
 $ tpRequerimento                     : chr "acordo"
 $ tpRequerimento                     : chr "acordoColetivoEspecificoPPE"
 $ tpRequerimento                     : chr "acordoColetivoEspecificoDomingosFeriados"
 $ tpRequerimento                     : chr "convencao"
 $ tpRequerimento                     : chr "termoAditivoAcordo"
 $ tpRequerimento                     : chr "termoAditivoConvecao"
 $ tpRequerimento                     : chr "termoAditivoAcordoEspecificoPPE"
 $ tpRequerimento                     : chr "termoAditivoAcordoEspecificoDomingoFeriado"
 $ tpVigencia                         : chr "2"
 $ sgUfDeRegistro                     : chr "SE"
 $ dtInicioRegistro                   : chr ""
 $ dtFimRegistro                      : chr ""
 $ dtInicioVigenciaInstrumentoColetivo: chr ""
 $ dtFimVigenciaInstrumentoColetivo   : chr ""
 $ tpAbrangencia                      : chr "Todos os tipos"
 $ ufsAbrangidasTotalmente            : chr "SE"
 $ cdMunicipiosAbrangidos             : chr ""
 $ cdGrupo                            : chr ""
 $ cdSubGrupo                         : chr ""
 $ noTituloClausula                   : chr ""
 $ utilizarSiracc                     : chr ""
 $ pagina                             : chr "2"
 $ qtdTotalRegistro                   : chr "1740"

Then I did the following to access the results:

library(httr)
a<-GET(url1)
b<-POST(url2,body=body,set_cookies(unlist(a$cookies)))

But unfortunately the answer does not return the expected results.

Note that url2 does not work... For it to work properly you need to access the filters from url1

– TheBiro

2017/04/26 at 14:31
I tried with url1. I thought that if I include the url1 request cookie in the url2 request, the problem is solved, but I was unsuccessful.

– José

2017/04/26 at 14:43

1 answer

Browser other questions tagged r httr

You are not signed in. Login or sign up in order to post.

by Guilherme Duarte • **918** points · Answer 1 · 2017-04-26T16:24:29+00:00

The question is on how to perform this specific scraping on R. Note that the form for Tprequester requires a list, which we can implement as vector.

In R, I would do so:

body <- list(
  nrCnpj="",
  nrCei="",
  noRazaoSocial="",
  dsCategoria="",
  tpRequerimento=c("acordo",
               "acordoColetivoEspecificoPPE",
               "acordoColetivoEspecificoDomingosFeriados",
               "convencao",
               "termoAditivoAcordo",
               "termoAditivoConvecao",
               "termoAditivoAcordoEspecificoPPE",
               "termoAditivoAcordoEspecificoDomingoFeriado"),
  tpVigencia="2",
  sgUfDeRegistro="SE",
  dtInicioRegistro="",
  dtFimRegistro="",
  dtInicioVigenciaInstrumentoColetivo="",
  dtFimVigenciaInstrumentoColetivo="",
  tpAbrangencia="Todos os tipos",
  ufsAbrangidasTotalmente="SE",
  cdMunicipiosAbrangidos="",
  cdGrupo="",
  cdSubGrupo="",
  noTituloClausula="",
  utilizarSiracc="",
  pagina="2",
  qtdTotalRegistro="1740")


library(httr)
  url1<-"http://www3.mte.gov.br/sistemas/mediador/ConsultarInstColetivo"

  a <- GET(url1)
url2 <- "http://www3.mte.gov.br/sistemas/mediador/ConsultarInstColetivo/getConsultaAvancada"

b <- POST(url2,body=body,set_cookies(unlist(a$cookies)))