Sites with authentication - Web Scraping - Javascript

Question

Sites with authentication - Web Scraping - Javascript

Asked 5 years, 2 months ago

Viewed 336 times

-1

I am trying to automate a web data acquisition process using JS. In my case, I need to pull the information from the page https://sistema.justwebtelecom.com.br/adm.php. However, before going to this page, you need to log in to https://sistema.justwebtelecom.com.br/login.php. After I already login, IE, I am on the page .../Adm.php .

const request = require('request-promise')
const cheerio = require('cheerio')

const URL2 = 'https://sistema.justwebtelecom.com.br/adm.php'

async function acesso(){
    const response = await request(URL2)

    let $ = cheerio.load(response)
    let title = $('title').text()
    console.log(title)

}
acesso()

However, by printing with console.log(title), I get the page title https://sistema.justwebtelecom.com.br/login.php, that is, before logging in. I ask for the page print after logging in and I have access to the .../Adm.php panel .

Using the Puppeteer:

const puppeteer = require('puppeteer')

const getInfo = async() => {

        const browser = await puppeteer.launch()
        const page = await browser.newPage()
        await page.goto('https://sistema.justwebtelecom.com.br/adm.php')

        const info = await page.evaluate(() => {
            return {
                Tag: document.title
            }
        })
        console.log(info)
        await browser.close()

    }
    getInfo()

I wonder if there’s a way I can get this information from the .../Adm.php page, with these packages, or I need others ?

1 answer

Browser other questions tagged javascript node.js web-scraping request

You are not signed in. Login or sign up in order to post.

by user187547 · Answer 1 · 2020-05-05T04:56:15+00:00

Dude, some pages have redirects and recaptcha, which is bad for any module you use, but I think you better make your webscraping with the Node js Puppeteer module. An example of scraping with Puppeteer:

const Puppeteer = require('Puppeteer')

let scrape = async () => {
 const browser = await 
 puppeteer.launch()
 const page = await browser.newPage()
 await page.goto('http://books.toscrape.com/')

const result = await page.evaluate(() => {
const books = []
document.querySelectorAll('section > div > ol > li img')
        .forEach(book => books.push(book.getAttribute('alt')))
return books
})

browser.close()
return result
};
 scrape().then((value) => {
console.log(value)
    })