I need help, I am using Node and Request Form to do Scraping

Asked

Viewed 35 times

-1

I’m trying to develop an application to take data from a particular site and send it to the database. It would be simple if you didn’t have to request to access this data.

As I’m beginning, I may be misexpressing myself, but I’m going to try to go into as much detail as possible to make it clear.

When accessing the site (http://preco.anp.gov.br/include/Resumo_Por_Municipio_Index.asp), you come face to face with an input, which you must enter a pertinent value, then an input select appears for you to select a data based on the pertinent value you had previously chosen, then a captcha appears.

I noticed that even without typing the captcha, just by pressing Ubmit, the request (POST) is sent to the server. After that, to access the relevant information, I just need to change the link to (http://preco.anp.gov.br/include/Resumo_Por_Municipio_Postos.asp). I believe this is due to the fact that the request is stored in cookies.

I started my code on NODE:

const querystring = require('querystring');
const request = require('request');

var form = {
    selSemana: '1087*De 12/04/2020 a 18/04/2020',
    desc_Semana: 'de 12/04/2020 a 18/04/2020',
    cod_Semana: '1087',
    txtMunicipio:'',
    selMunicipio: '1033*MINASGERAIS',
    image1: ''
};

var formData = querystring.stringify(form);
var contentLength = formData.length;

request({
    headers: {
      'Content-Length': contentLength,
      'Content-Type': 'application/x-www-form-urlencoded'
    },
    uri: 'http://preco.anp.gov.br/include/Resumo_Por_Municipio_Index.asp',
    body: formData,
    method: 'POST'
  }, function (err, res, body) {
        console.log(`statusCode: ${res.statusCode}`);
        console.log(res);
  });

So far I can make the POST request, but I have tried several ways to bring cookies to GET on the second line, but I can’t. I wonder if someone could help me, please ?

1 answer

0


I suggest you use Puppeteer to control the browser and enter the value in the input and fill in the select field, but as Voce said it has captcha, it is difficult to automate the webscraping task in any library, whether it is Puppeteer or not.

  • 1

    Requests for cookies where inputs were stored were generated using google Analytics, the best option is puppetteer.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.