Web scraping on page with login and password

Asked

Viewed 1,875 times

1

I am trying to extract source code from an html file with the following style:

 <div class="both"></div>
    <div class="st-box" id="source-code">
        <h3>SOURCE CODE</h3>

        <div class="wrap code-answer-1">

            <pre id="code" class="code-2">#include &lt;bits/stdc++.h&gt;

        using namespace std;

        int main ()
        {
            double A, B, C;

            scanf (&quot;%lf %lf %lf&quot;, &amp;A, &amp;B, &amp;C);
            printf (&quot;TRIANGULO: %.3lf\n&quot;, (A * C) / 2.0);
            printf (&quot;CIRCULO: %.3lf\n&quot;, C * C * 3.14159);
            printf (&quot;TRAPEZIO: %.3lf\n&quot;, ((A + B) * C) / 2.0);
            printf (&quot;QUADRADO: %.3lf\n&quot;, B * B);
            printf (&quot;RETANGULO: %.3lf\n&quot;, A * B);
            system (&quot;pause&quot;);
            return 0;
        }</pre>

                </div>
            </div>

                </ul>
        </div>

Using the following code:

 def getCode(self, id):
    return self.getPage('https://www.urionlinejudge.com.br/judge/'+self.lang+'/runs/code/'+id).find("pre ", {"id": "code"}).text

However I get the following error:

Attributeerror: 'Nonetype' Object has no attribute 'text'

How to solve?

  • You forgot to put the complete error. If the error is even on this line, it means that it did not find the element <pre id="code"> in this html.

  • Managed to solve your problem? Still have some questions?

1 answer

1

First, you inserted a space after pre:

[...].find("pre ", {"id": "code"}). text

Correcting the code and removing this space, as below the error should be remedied:

def getCode(self, id):
    return self.getPage('https://www.urionlinejudge.com.br/judge/'+self.lang+'/runs/code/'+id).find("pre", {"id": "code"}).text

But, however, this site has login and you can only access the id for your user. Below I made a code that you can implement in your function to have what:

import mechanize
from bs4 import BeautifulSoup as bs
import http.cookiejar as cookielib

cookies = cookielib.CookieJar()  # cria um repositório de cookies
browser = mechanize.Browser()    # inicia um browser
browser.set_cookiejar(cookies)   # aponta para o seu repositório de cookies

# substitua 'seu_id' por um id válido que você tenha acesso
browser.open('https://www.urionlinejudge.com.br/judge/pt/runs/code/seu_id')

browser.select_form(nr=0)      # o formulário de senha é o primeiro
browser.form['email'] = 'seu_emaik'     # substitua 'seu_email' pelo seu e-mail
browser.form['password'] = 'senha'  # substitua 'senha' pela sua senha
browser.submit()               # submissão dos dados

pagina = browser.response().read()  # essa é a página que você queria 

# Beautiful Soup aqui
soup = bs(pagina,'html.parser')
codigo = soup.find("pre",{"id":"code"}).text

print(codigo) # o dado que você buscava
  • I tried taking out the space and the mistake was the same

  • I made an edition of the answer because I saw that the page you try to access lacks login and password. Try it now and if it fails, try to be clearer by showing the implementation you made and the error itself so you can simulate what you built.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.