Sites with authentication - Web Scraping - Javascript

Asked

Viewed 336 times

-1

I am trying to automate a web data acquisition process using JS. In my case, I need to pull the information from the page https://sistema.justwebtelecom.com.br/adm.php. However, before going to this page, you need to log in to https://sistema.justwebtelecom.com.br/login.php. After I already login, IE, I am on the page .../Adm.php .

const request = require('request-promise')
const cheerio = require('cheerio')

const URL2 = 'https://sistema.justwebtelecom.com.br/adm.php'

async function acesso(){
    const response = await request(URL2)

    let $ = cheerio.load(response)
    let title = $('title').text()
    console.log(title)

}
acesso()

However, by printing with console.log(title), I get the page title https://sistema.justwebtelecom.com.br/login.php, that is, before logging in. I ask for the page print after logging in and I have access to the .../Adm.php panel .

Using the Puppeteer:

const puppeteer = require('puppeteer')

const getInfo = async() => {

        const browser = await puppeteer.launch()
        const page = await browser.newPage()
        await page.goto('https://sistema.justwebtelecom.com.br/adm.php')

        const info = await page.evaluate(() => {
            return {
                Tag: document.title
            }
        })
        console.log(info)
        await browser.close()

    }
    getInfo()

I wonder if there’s a way I can get this information from the .../Adm.php page, with these packages, or I need others ?

1 answer

0

Dude, some pages have redirects and recaptcha, which is bad for any module you use, but I think you better make your webscraping with the Node js Puppeteer module. An example of scraping with Puppeteer:

const Puppeteer = require('Puppeteer')

let scrape = async () => {
 const browser = await 
 puppeteer.launch()
 const page = await browser.newPage()
 await page.goto('http://books.toscrape.com/')

const result = await page.evaluate(() => {
const books = []
document.querySelectorAll('section > div > ol > li img')
        .forEach(book => books.push(book.getAttribute('alt')))
return books
})

browser.close()
return result
};
 scrape().then((value) => {
console.log(value)
    })
  • Hello DKNS, thanks for the help. I even tried to use this Puppeteer tb, but I did not succeed. Could you tell me how I could get the data using it ?

  • The data would be a text on a page or ... ??

  • Yeah. But at first I just want to get the headline, because if I can get the headline, then I can get the rest of the page info.

  • I suggest you take a look at the documentation, you’ll understand better and they set an example with the code.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.