Web scraping on page with login and password

Question

Web scraping on page with login and password

Asked 6 years, 4 months ago

Viewed 1,875 times

1

I am trying to extract source code from an html file with the following style:

 <div class="both"></div>
    <div class="st-box" id="source-code">
        <h3>SOURCE CODE</h3>

        <div class="wrap code-answer-1">

            <pre id="code" class="code-2">#include &lt;bits/stdc++.h&gt;

        using namespace std;

        int main ()
        {
            double A, B, C;

            scanf (&quot;%lf %lf %lf&quot;, &amp;A, &amp;B, &amp;C);
            printf (&quot;TRIANGULO: %.3lf\n&quot;, (A * C) / 2.0);
            printf (&quot;CIRCULO: %.3lf\n&quot;, C * C * 3.14159);
            printf (&quot;TRAPEZIO: %.3lf\n&quot;, ((A + B) * C) / 2.0);
            printf (&quot;QUADRADO: %.3lf\n&quot;, B * B);
            printf (&quot;RETANGULO: %.3lf\n&quot;, A * B);
            system (&quot;pause&quot;);
            return 0;
        }</pre>

                </div>
            </div>

                </ul>
        </div>

Using the following code:

 def getCode(self, id):
    return self.getPage('https://www.urionlinejudge.com.br/judge/'+self.lang+'/runs/code/'+id).find("pre ", {"id": "code"}).text

However I get the following error:

Attributeerror: 'Nonetype' Object has no attribute 'text'

How to solve?

You forgot to put the complete error. If the error is even on this line, it means that it did not find the element <pre id="code"> in this html.

– nosklo

2019/02/20 at 19:13
Managed to solve your problem? Still have some questions?

– Rafael Barros

2019/02/22 at 01:55

1 answer

Browser other questions tagged python beautifulsoup

You are not signed in. Login or sign up in order to post.

by Rafael Barros • **840** points · Answer 1 · 2019-02-20T18:59:42+00:00

First, you inserted a space after pre:

[...].find("pre ", {"id": "code"}). text

Correcting the code and removing this space, as below the error should be remedied:

def getCode(self, id):
    return self.getPage('https://www.urionlinejudge.com.br/judge/'+self.lang+'/runs/code/'+id).find("pre", {"id": "code"}).text

But, however, this site has login and you can only access the id for your user. Below I made a code that you can implement in your function to have what:

import mechanize
from bs4 import BeautifulSoup as bs
import http.cookiejar as cookielib

cookies = cookielib.CookieJar()  # cria um repositório de cookies
browser = mechanize.Browser()    # inicia um browser
browser.set_cookiejar(cookies)   # aponta para o seu repositório de cookies

# substitua 'seu_id' por um id válido que você tenha acesso
browser.open('https://www.urionlinejudge.com.br/judge/pt/runs/code/seu_id')

browser.select_form(nr=0)      # o formulário de senha é o primeiro
browser.form['email'] = 'seu_emaik'     # substitua 'seu_email' pelo seu e-mail
browser.form['password'] = 'senha'  # substitua 'senha' pela sua senha
browser.submit()               # submissão dos dados

pagina = browser.response().read()  # essa é a página que você queria 

# Beautiful Soup aqui
soup = bs(pagina,'html.parser')
codigo = soup.find("pre",{"id":"code"}).text

print(codigo) # o dado que você buscava