How to make a scraping changing page

Asked

Viewed 36 times

0

I need to create a bot that enters the home page of a particular site, take all the link’s of the posts, enter a, capture a div that contains the video and move on to the next link captured on the index.

I’ve already created the part that captures the link, now I need to know how to do the part to access them and capture the div who owns the video

request("https://www.site.com", function(error, response, html) 
{
    if(!error)
    {
        var $ = cheerio.load(html)        
        var resultado = [];

        $("div.mozaique > div.thumb-block").each(function(i)
        {
            var title = $(this).find("div.thumb-under > p > a").eq(0).text();
            var link  = $(this).find("div.thumb-under > p > a").attr("href");
            var img   = $(this).find("div.thumb-inside > div.thumb > a > img").attr("data-src");

            resultado.push({
                id: i,
                title: title,
                link: link,
                img: img
            });
        });
    }

    // Escrevendo o arquivo .json com o array 
    fs.writeFile('resultado.json', JSON.stringify(resultado, null, 4), function(err) {
        console.log('JSON escrito com sucesso! O arquivo está na raiz do projeto.')
    })
});

If that’s not clear, I basically need:

  1. Log in
  2. Capture the posts link and store them
  3. Enter the first captured link
  4. Catch a div that contains a link . mp4
  5. Go to the next link captured and repeat step 4 and 5.
  • Have you tried creating a function to access the link and capture the video? This way you can make a for on the captured links.

  • That’s exactly what I can’t do!

1 answer

1


From what I understand you want to make several requests and go composing the arrays with the elements that you get from the page. You need to call another function and create a loop of it to make the requests according to the links of the array. It would look something like this:

request("https://www.site.com", function(error, response, html) 
{
    if(!error)
    {
        var $ = cheerio.load(html)        
        var resultado = [];

        $("div.mozaique > div.thumb-block").each(function(i)
        {
            var title = $(this).find("div.thumb-under > p > a").eq(0).text();
            var link  = $(this).find("div.thumb-under > p > a").attr("href");
            var img   = $(this).find("div.thumb-inside > div.thumb > a > img").attr("data-src");

            resultado.push({
                id: i,
                title: title,
                link: link,
                img: img
            });
            //Verifica se o array já acabou de ser composto
            if(i==$("div.mozaique > div.thumb-block").length){
                accessURI();
            }
        });

    }
});

let i = 0;
function accessURI(){
    if(i==resultado.length){
        //Call back do loop
    }
    request(resultado[i].link,{data: i++}, (error, response, body)=>{
        //Componha o array com as divs
        accessURI()
    })
}
  • Peter did not understand very well, this its function no, can give an example of how to use?

Browser other questions tagged

You are not signed in. Login or sign up in order to post.