Crawler - how to access several pages

Asked

Viewed 161 times

0

I created a code on Ode to search for the version of the system and the name of the municipality of a portal, however I am not able to make it search for the information of another municipality only one.

In the request I would like it to loop and access other addresses like this http://transparencia.bocaiuvadosul.pr.gov.br:9191/pronimtb/index.Asp and then throw information in the txt file.

the code is this:

var request = require('request');
var cheerio = require('cheerio');

request('http://transparencia.matinhos.pr.gov.br/pronimtb/index.asp', function(err,res,body) 

{
    if(err) console.log ('Err: ' + err);

    var $ = cheerio.load(body);

    $ ('.Class_Relatorio').each(function(){

    var nmcliente = $(this).find('.Class_Relatorio tr:nth-of-type(4) td:nth-of-type(2)').text().trim();
    var versao = $(this).find('.Class_Relatorio tr:nth-of-type(1) td:nth-of-type(1)').text().trim();

    console.log('Titulo: ' + versao);

    fs.appendFile('versao.txt', nmcliente + '|' + versao + '|'+  '\n');

    })
})
  • Sorack I can manually put in the code or put all other addresses inside a txt.]

  • The Node version is V8.12.0.

  • All right @Sorack is that I’ve never done anything ai to joking there it would already help me in my work.

1 answer

0


Rewriting your code to use promises and carry a list of URLs would look something like this:

const cheerio = require('cheerio');
const { get } = require('request');
const { writeFileSync } = require('fs');
const { promisify } = require('util');

// Transforma o "get" em uma função que retorna uma promessa
const promisedGET = promisify(get);

const visitar = async uri => {
  const { statusCode, body } = await promisedGET({ uri, encoding: 'binary' });

  // Retorna um erro caso o status seja diferente de 200
  if (statusCode !== 200) throw new Error(body);

  return { body };
}

const ler = async ({ body }) => {
  const $ = cheerio.load(body);

  const cliente = $('table.Class_Relatorio tr:nth-of-type(4) > td:nth-of-type(2)').text().trim();
  const versao = $('table.Class_Relatorio tr:nth-of-type(1) > td:nth-of-type(1)').text().trim();

  return { cliente, versao };
}

const executar = async urls => {
  // Faz requisições para todos os sites da lista
  const paginas = await Promise.all(urls.map(url => visitar(url)));
  // Lê as páginas retornadas e transforma em objetos
  const linhas = await Promise.all(paginas.map(conteudo => ler(conteudo)));
  // Transforma as linhas em uma string de conteúdo
  const conteudo = linhas.map(({ cliente, versao }) => `${cliente} | ${versao} |`).join('\n');
  // grava o conteúdo no arquivo
  writeFileSync('versao.txt', conteudo);

  return conteudo;
}

// Exemplo da chamada da função principal
(async () => {
  // Inicia o timer
  console.time('Execução');

  try {
    await executar([
      'http://transparencia.bocaiuvadosul.pr.gov.br:9191/pronimtb/index.asp',
      'http://transparencia.matinhos.pr.gov.br/pronimtb/index.asp',
      'http://transparencia.rolandia.pr.gov.br/pronimtb/index.asp',
      'http://transparencia.castanhal.pa.gov.br/pronimtb/index.asp',
    ]);
  } catch (err) {
    console.log(err)
  }

  // Totaliza o tempo de execução
  console.timeEnd('Execução');
})();

Note that if the information is not available at the expected location (as is the case of Rolândia) the result of the line will not be as expected. Thus the above run output in the file versão.txt is:

Prefeitura Municipal de Bocaiúva do Sul | PRONIM TB 518.01.07-013 |
Prefeitura Municipal de Matinhos | PRONIM TB 518.01.04-000 |
 | PRONIM TB 518.01.07-012 |
Prefeitura Municipal de Castanhal | PRONIM TB 518.01.07-012 |
  • 1

    Wonderful. That was the result I was looking for. Thank you

Browser other questions tagged

You are not signed in. Login or sign up in order to post.