Load pages without images and css in Phantomjs

Asked

Viewed 80 times

0

Does anyone know any way to load a page faster using phantomjs...

Because I have a bot that goes through certain pages through pagination, however I need a way to improve the loading time of these pages to gain time.

**Specific site uses a protection against BOTS (Perimeterx) that blocks file_get_contents and scrapper with PHP

  • 1

    I think your problem is not related to a specific language or extension. For, languages in general can store the html of a page very quickly. Its major limitation is the speed that the server of the selected site generates the html and bandwidth of this server.

  • Like, with php you use file_get_contents(url) ... This function will not be finished until it gets the return of the site. If the site takes another 2 seconds to return the response, your application will take at least 2 seconds to execute. No matter what language or extension.

1 answer

3


After much research I found the solution here: https://www.scrapehero.com/how-to-increase-web-scraping-speed-using-puppeteer/

Basically, it’s this chunk of code that points to the font, css, and image removal parameters.

await page.setRequestInterception(true);

page.on('request', (req) => {
    if(req.resourceType() == 'stylesheet' || req.resourceType() == 'font' || req.resourceType() == 'image'){
        req.abort();
    }
    else {
        req.continue();
    }
});

I hope you help anyone who might need.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.