Help with searching for information in an html with js/Node

Asked

Viewed 60 times

0

i’m having trouble implementing a code. I’ve been helped at another time and now I’m needing it again. The goal is a script that helps me to remove from an html the name of teachers and the link of their respective lattes, save in some way for later I treat the data. I am a beginner in this world and still do not know how to work with jquery, but nothing prevents me from using this option if I am guided. Analyzing the html code of the page, I could notice that the html tag <h2> is only used for the names of teachers, so I took the contents of all headers2 and managed to save. I realized that after this tag, the next "href" is where the link lattes of the respective teacher is... I’m stuck at exactly this point. I’ve said a lot, but I think I’ve made myself clear. Thank you guys.

    const url = 'http://www.ppg-educacao.uff.br/novo/index.php/corpo-docente'
const axios = require('axios')
const cheerio = require('cheerio')


axios.get(url).then(response =>{
    const funcionarios = response.data
    const $ = cheerio.load(response.data)
    const professores = $('h2').text()
    console.log($('h2').text())
    //const lattes = $('a href="http://lattes.cnpq.br/"' ).text()
    //console.log(lattes)
    //const informacoes = []
    //informacoes.push({'nome ': professores, 'lattes ': lattes})
    //console.log (informacoes)

})

Saída atual com os nomes dos professores.

2 answers

0

Analyzing the DOM of the page one notices that each teacher is in a block <div> with its standardized elements (which facilitates our reading of the data), with the class item column-1.

The only element h2 is the name of the teacher, as you did. The second element p contains a single element a which is the link to the resume.

To select Ivs with jQuery, we use $('.item.column-1')

The code would look like this, resulting in an array of objects with name and link.

var professores = []
$('.item.column-1').each(function(index) {
  var nome = $(this).children('h2').text()
  var link = $(this).find('p a')[0].href
  professores.push({
    nome: nome,
    link: link
  })
})

0

I modified the code for you.

const url = 'http://www.ppg-educacao.uff.br/novo/index.php/corpo-docente'
const axios = require('axios')
const cheerio = require('cheerio')

let objs = []
let nomes = []
let urls = []

axios.get(url).then(response =>{
    const funcionarios = response.data
    const $ = cheerio.load(response.data)
    $('h2').each((i, e) => {
        nomes.push(e.children[0].data.trim());
    });
    $('p a').each((i, e) => {
        urls.push(e.attribs.href);
    });

    nomes.forEach((nome,i) => {
        objs.push({nome: nomes[i], lattes: urls[i]});
    });

    console.log(objs);
})

Upshot:

[ { nome: 'Adriano Vargas Freitas',
    lattes: 'http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4276752U6' },
  { nome: 'Alessandra Frota Schueler',
    lattes: 'http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4799322D6' },
  { nome: 'Bruno Alves Dassie',
    lattes: 'http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4707912H5' },
  { nome: 'Carlos Eduardo Zaleski Rebuá',
    lattes: 'http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4260783J4' },
  { nome: 'Carmen Lúcia Vidal Pérez',
    lattes: 'http://lattes.cnpq.br/0646181238100482' },
  { nome: 'Cecilia Maria Aldigueri Goulart',
    lattes: 'http://lattes.cnpq.br/7281306371405447' },

    ...
{ nome: 'Valdelúcia Alves da Costa',
    lattes: 'http://lattes.cnpq.br/3766561922402070' },
  { nome: 'Waldeck Carneiro',
    lattes: 'http://lattes.cnpq.br/4129978776761994' },
  { nome: 'Zoia Ribeiro Prestes',
    lattes: 'http://lattes.cnpq.br/1927800358488148' },
  { nome: 'Zuleide Simas da Silveira',
    lattes: 'http://lattes.cnpq.br/8037763146233564' } ]

Browser other questions tagged

You are not signed in. Login or sign up in order to post.