How to simulate a form action=POST with urllib2

Asked

Viewed 479 times

3

I want to create a program that reads my college notes and display on the screen , for this I am using urllib2 to receive the web page where my notes are presented but to receive I need to log in.

code in which I am trying to simulate the form Submit and return the page that it redirects me.

from urllib2 import *
import urllib
proxy = ProxyHandler({'http': r'http://xxxx:xxxx@xxxxx@xxxxx:xxxxxx'})
auth = HTTPBasicAuthHandler()
opener = build_opener(proxy, auth, HTTPHandler)
install_opener(opener)

data = urllib.urlencode({'__EVENTTARGET':'','__EVENTARGUMENT':'','__VIEWSTATE':'/wEPDwULLTE4NzU1ODgxNTkPZBYCZg9kFgICAw9kFgICCQ9kFgICAQ9kFgICAQ9kFgICAQ8QZGQWAGQYAQU2Y3RsMDAkRm9ybXVsYXJpb0NvbnRlbnRQbGFjZUhvbGRlciRFc3RhZG9UZWxhTXVsdGlWaWV3Dw9kZmT14eU493cBliuPCSv6TJQbGDKjrA=='
                  ,'__VIEWSTATEGENERATOR':'7C9DFC57'
                  ,'ctl00$FormularioContentPlaceHolder$UsuarioTextBox':"12345"
                  ,"ctl00$FormularioContentPlaceHolder$SenhaTextBox":"12345"
                  ,'submit':'Entrar'})

url = 'http://www4.uva.br/UniversusNet/Seguro/Login.aspx?ReturnUrl=%2fUniversusNet%2fNotasFaltasTotais.aspx'
response = urlopen(url, data).read()
print response

the web page form which is for me to log in and redirect me to the notes

<form name="aspnetForm" method="post" action="Login.aspx?ReturnUrl=%2fUniversusNet%2fNotasFaltasTotais.aspx" onsubmit="javascript:return WebForm_OnSubmit();" id="aspnetForm">
    <input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="">
    <input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="">
    <input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwULLTE4NzU1ODgxNTkPZBYCZg9kFgICAw9kFgICCQ9kFgICAQ9kFgICAQ9kFgICAQ8QZGQWAGQYAQU2Y3RsMDAkRm9ybXVsYXJpb0NvbnRlbnRQbGFjZUhvbGRlciRFc3RhZG9UZWxhTXVsdGlWaWV3Dw9kZmT14eU493cBliuPCSv6TJQbGDKjrA==">
    <input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="7C9DFC57">
    <input name="ctl00$FormularioContentPlaceHolder$UsuarioTextBox" type="text" id="ctl00_FormularioContentPlaceHolder_UsuarioTextBox" class="caixaTexto">
    <input name="ctl00$FormularioContentPlaceHolder$SenhaTextBox" type="password" id="ctl00_FormularioContentPlaceHolder_SenhaTextBox" class="caixaTexto"></td>
    <input type="submit" name="ctl00$FormularioContentPlaceHolder$EntrarButton" value="Entrar" onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&quot;ctl00$FormularioContentPlaceHolder$EntrarButton&quot;, &quot;&quot;, true, &quot;&quot;, &quot;&quot;, false, false))" id="ctl00_FormularioContentPlaceHolder_EntrarButton" class="botao">&nbsp; <a href="EsqueceuSenha.aspx" id="ctl00_FormularioContentPlaceHolder_LinkExibeEsqueceusenha" class="link">Esqueceu sua senha?</a></td>

and this page directs me to the login page. I’m trying to log in but I’m not succeeding, could someone help me?

  • You know which form fields should be filled in?

  • @qmechanik http://stackoverflow.com/questions/29632067/how-to-simulate-form-action-post-in-urllib2?noredirect=1#comment47407254_29632067

  • know yes, in this question I commented above it was I who did and this most updated ...

  • Maybe what’s getting in the way is proxy, no need to use it?

  • Need not put to test I needed to use because I was in a closed network with proxy at work per user and password. I will test from home , however I already have a 2 modes ready with urllib2 and requests module, I will be both at home

1 answer

2


You should not be able to log in because it is being declared the input type button, not the name.

data = urllib.urlencode({'__EVENTTARGET':'','__EVENTARGUMENT':'','__VIEWSTATE':'/wEPDwULLTE4NzU1ODgxNTkPZBYCZg9kFgICAw9kFgICCQ9kFgICAQ9kFgICAQ9kFgICAQ8QZGQWAGQYAQU2Y3RsMDAkRm9ybXVsYXJpb0NvbnRlbnRQbGFjZUhvbGRlciRFc3RhZG9UZWxhTXVsdGlWaWV3Dw9kZmT14eU493cBliuPCSv6TJQbGDKjrA=='
     ,'__VIEWSTATEGENERATOR':'7C9DFC57'
     ,'ctl00$FormularioContentPlaceHolder$UsuarioTextBox':"12345"
     ,"ctl00$FormularioContentPlaceHolder$SenhaTextBox":"12345"
     ,'submit':'Entrar'}) # <------ Errado

The right thing should be:

data = urllib.urlencode({'__EVENTTARGET':'','__EVENTARGUMENT':'','__VIEWSTATE':'/wEPDwULLTE4NzU1ODgxNTkPZBYCZg9kFgICAw9kFgICCQ9kFgICAQ9kFgICAQ9kFgICAQ8QZGQWAGQYAQU2Y3RsMDAkRm9ybXVsYXJpb0NvbnRlbnRQbGFjZUhvbGRlciRFc3RhZG9UZWxhTXVsdGlWaWV3Dw9kZmT14eU493cBliuPCSv6TJQbGDKjrA=='
     ,'__VIEWSTATEGENERATOR':'7C9DFC57'
     ,'ctl00$FormularioContentPlaceHolder$UsuarioTextBox':"12345"
     ,"ctl00$FormularioContentPlaceHolder$SenhaTextBox":"12345"
     ,'ctl00$FormularioContentPlaceHolder$EntrarButton':'Entrar'}) # <------ Certo

The code should look like this:

# -*- coding: utf-8 -*-

from urllib2 import *
import urllib, cookielib

def obterNotas(url, usuario, senha):
    proxy = ProxyHandler({'http': "xxxx.xxxx:zzzz"})
    auth = HTTPBasicAuthHandler()
    cookie = cookielib.CookieJar()
    opener = build_opener(proxy, auth, HTTPHandler, HTTPCookieProcessor(cookie))
    install_opener(opener)

    dados = urllib.urlencode({'__EVENTTARGET': '',
                         '__EVENTARGUMENT': '',
                         '__VIEWSTATE': '/wEPDwULLTE4NzU1ODgxNTkPZBYCZg9kFgICAw9kFgICCQ9kFgICAQ9kFgICAQ9kFgICAQ8QZGQWAGQYAQU2Y3RsMDAkRm9ybXVsYXJpb0NvbnRlbnRQbGFjZUhvbGRlciRFc3RhZG9UZWxhTXVsdGlWaWV3Dw9kZmT14eU493cBliuPCSv6TJQbGDKjrA==',
                         '__VIEWSTATEGENERATOR':'7C9DFC57',
                         'ctl00$FormularioContentPlaceHolder$UsuarioTextBox': usuario,
                         "ctl00$FormularioContentPlaceHolder$SenhaTextBox": senha,
                         'ctl00$FormularioContentPlaceHolder$EntrarButton':'Entrar'})
    request = Request(url, dados)
    paginaLogin = urlopen(request).read()
    paginaNotas = None

    # Aqui você verifica se teve sucesso no login
    if 'Algo que indique o sucesso do login' in paginaLogin:
        paginaNotas = urlopen('http://www4.uva.br/UniversusNet/NotasFaltasTotais.aspx').read()
    return paginaNotas

And to use it:

def main():
    urlLogin = 'http://www4.uva.br/UniversusNet/Seguro/Login.aspx'
    notas = obterNotas(urlLogin, 'usuario', 'senha')
    # Aqui você manipula a variável 'notas' e extrai as informações que você quer 

Note: I have not tested, probably will have to be done some adjustments to work as expected.


Another alternative would be to use the library requests that allows working with sessões, thus making it easier to do the Login and retrieve data from a page that requires authentication.

# -*- coding: utf-8 -*-

import requests

def obterNotas(url, usuario, senha):
    dados = {'__EVENTTARGET': '',
                              '__EVENTARGUMENT': '',
                              '__VIEWSTATE': '/wEPDwULLTE4NzU1ODgxNTkPZBYCZg9kFgICAw9kFgICCQ9kFgICAQ9kFgICAQ9kFgICAQ8QZGQWAGQYAQU2Y3RsMDAkRm9ybXVsYXJpb0NvbnRlbnRQbGFjZUhvbGRlciRFc3RhZG9UZWxhTXVsdGlWaWV3Dw9kZmT14eU493cBliuPCSv6TJQbGDKjrA==',
                              '__VIEWSTATEGENERATOR': '7C9DFC57',
                              'ctl00$FormularioContentPlaceHolder$UsuarioTextBox': usuario,
                              'ctl00$FormularioContentPlaceHolder$SenhaTextBox': senha,
                              'ctl00$FormularioContentPlaceHolder$EntrarButton': 'Entrar'}
    urlNotas = 'http://www4.uva.br/UniversusNet/NotasFaltasTotais.aspx'

    with requests.Session() as sessao:
        paginaLogin = sessao.post(url, data=dados).text
        paginaNotas = None

        # Aqui você verifica se teve sucesso ao logar
        if 'Algo que indique o sucesso do login' in paginaLogin:
            paginaNotas = sessao.get(urlNotas).text
        return paginaNotas

def main():
    url = 'http://www4.uva.br/UniversusNet/Seguro/Login.aspx'
    notas = obterNotas(url, 'usuario','senha')
    # Aqui você manipula a variável 'notas' e extrai as informações que você quer

main()

If necessary use a proxy, just do the following:

import requests

proxy = { "http": "xxxx.xxxx:zzzz", }

requests.get("http://foo.bar.baz", proxies=proxy)
  • i managed to perform the request earlier, I had to edit the request header..

  • 1

    @user3896400 Worked? I checked what information was sent on POST, the values of __VIEWSTATE and __VIEWSTATEGENERATOR are the same that are in the question.

  • 1

    @user3896400 If you need to extract a lot of information from HTML, it may be useful to use a parser to do this work, for example the Beautiful Soup serves well for that purpose. If the answer was useful, If possible, mark it as accepted, click , anything edit and improve the response.

  • It worked.. now I’m trying to search a substring inside the returned html.. must format a string ">xxx</a>" = ">__</a>" regardless of the value of x return true

  • @user3896400 Thanks! I’m not sure what you want to do, you want to transform ">xxx</a>" in ">__</a>" and then send it on POST?

Browser other questions tagged

You are not signed in. Login or sign up in order to post.