Emulate navigation

Asked

Viewed 62 times

1

I was wondering if there’s any way I can emulate browsing through a page, just like they do mechanize or the Httpurlconection, in which I can request other pages through them. Is there any way I can request and go browsing through the pages (which are not in my domain) to mine this information?

1 answer

0

You can use the tool Puppeteer, which is basically a Google Chrome with an interface exposed in Javascript methods.

With a simple script you can collect information from any web page (as long as it is public or you have a form of access). The following example collects the name of the author of a question in Stackoverflow:

const puppeteer = require('puppeteer');

(async () => {
  try {
    const browser = await puppeteer.launch();

    // Abrimos uma nova página:
    const page = await browser.newPage();
    await page.goto('/q/88472/69296');

    // Recuperamos o elemento que possui o autor da página, em seguida, o texto.
    const element = await page.$('div.user-details > a');
    const text = await page.evaluate((el) => el.textContent, element);

    console.log(`Autor da pergunta: ${text}.`);

    await browser.close();
  } catch (error) {
    console.error(error.message);
  }
})();

I left one sample repository available on Github.


The library is a bit complicated, since the methods are a bit different from the usual. But the documentation covers several cases. :)

This video (in English) shows slightly more complex examples using this library.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.