Scrapy for login

Asked

Viewed 733 times

0

I took this code from the internet and changed it a little, to log in to the cpfl site, but when I use the command scrapt crawl myproject nothing happens and the command scrapy runspider items.py of the error:

No element find in <200 https://servicosonline.cpfl.com.br/agencia-webapp/>

Can you tell me what’s wrong?

import scrapy
BASE_URL = 'https://servicosonline.cpfl.com.br/agencia-webapp/#/login'
USER_NAME = 'username'
PASSWORD = 'password'
class ShareSpider(scrapy.Spider):
    name = "sharespider"
    start_urls = ['https://servicosonline.cpfl.com.br/agencia-webapp/#/login']
    def parse(self, response):
        yield scrapy.FormRequest.from_response(
            response,
            formxpath='//form[@id="panelMobile"]',
            formdata={
                'documentoEmail': USER_NAME,             
                'Password': PASSWORD,             
                'Action':'1',
            },
            callback=self.after_login)
    def after_login(self, response):
        pass

1 answer

1

The problem is that the user input and password form is not on the page you are loading - the page you are loading only has javascript code, and the form is mounted by this code dynamically.

Like the scrapy does not perform javascript, it is not possible to use it this way on this site - this leaves you with two alternatives:

  • Analyze the javascript code of the page, find out what it does, and "simulate" it with python code written manually. This solution is usually more efficient but much more complex to implement.

    In the specific case of the CPFL site, it seems that when sending the login, it makes via AJAX javascript a HTTP POST in https://servicosonline.cpfl.com.br/agencia-webapi/api/token with the following parameters:

    {
        'client_id': 'agencia-virtual-cpfl-web',
        'grant_type': 'password',
        'username', USER_NAME,
        'password': PASSWORD,
    }
    

    To find out this I used Firefox inspector mode (press F12) and tried to log in, then on the tab network you can see everything the page is doing on the network.

    yield scrapy.FormRequest(
        url='https://servicosonline.cpfl.com.br/agencia-webapi/api/token',
        formdata={
            'client_id': 'agencia-virtual-cpfl-web',
            'grant_type': 'password',
            'username', USER_NAME,
            'password': PASSWORD,
        },
        callback=self.after_login,
    )
    

    This code above should probably log you in, but the return will not be a page but something like 'OK' - you will have to continue inspecting the page with the browser, to figure out what to do with it to get what you want - logging in is just the beginning of the problem.

  • The other much simpler alternative to implement is to use the selenium - is a lib that allows you to control a browser through python, like Chrome or firefox - Using it you can run javascript. But it’s much less efficient because you’re running an entire browser...

I hope I’ve put you in the right direction.

  • Thanks for the help, very useful, so I want to take the invoices and store in a folder on my pc, logging in of this method you gave me believe that not for sure, but already gave a great whitening for me, thank you very much, I’ll see if I can do by the scrapy yet, I probably can’t because I’m still in it, but I see through Lenium, thanks

Browser other questions tagged

You are not signed in. Login or sign up in order to post.