The URL responds only with the open site

Asked

Viewed 52 times

0

I need to scrape the information from this page here.

In the developer tools, I found this link with the necessary answers.

The problem is that the link with the answers opens only if the site is previously open.

Try to open the link with the answers in an anonymous tab and you will see everything blank.

code:

import scrapy


class AaidSpider(scrapy.Spider):
    name = 'agm'
    starts_urls = [

        'https://www.agmgranite.com/paginate.php?page=1&lid=3&f=reset&invp='
    ]

    def parse(self, response):
        print(response.body)

reply:

[]

How to scrape these answers if you need the open site?

1 answer

1


João, When you log in to the main website it gives you a navigation cookie. To make such a request you must pass the header parameters together with the request according to the code below.

import requests

headers = {
    'authority': 'www.agmgranite.com',
    'accept': '*/*',
    'x-requested-with': 'XMLHttpRequest',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36',
    'sec-fetch-site': 'same-origin',
    'sec-fetch-mode': 'cors',
    'referer': 'https://www.agmgranite.com/inventory/hill-country-spicewood/?f=reset',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'pt-BR,pt;q=0.9,en-US;q=0.8,en;q=0.7',
    'cookie': 'PHPSESSID=cfd3f3811f8d4bed78f146cd3ed8f3e1; _ga=GA1.2.1369799401.1580910358; _gid=GA1.2.1574686430.1580910358',
}

response = requests.get('https://www.agmgranite.com/paginate.php?page=1&lid=3&f=reset&invp=', headers=headers)

If you get a problem in capturing the header or it expires after some time (It didn’t happen to me), you can choose to use Selenium, with it you can log in initially on the main site and after that redirect to the desired link.

  • 1

    That’s what I needed, I got the cookies via Lenium and it worked! Thank you :)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.