Sites with authentication - Web Scraping - Python

Question

Sites with authentication - Web Scraping - Python

Asked 5 years, 3 months ago

Viewed 1,363 times

1

BR: I’m trying to automate a web data acquisition process using Python. In my case, I need to pull the information from the page https://sistema.justwebtelecom.com.br/adm.php. However, before going to this page, you need to log in to https://sistema.justwebtelecom.com.br/login.php. The code below should theoretically log into the site:

from selenium import webdriver
from bs4 import BeautifulSoup

import time
import requests

browser = webdriver.Firefox()
browser.get("https://sistema.justwebtelecom.com.br/login.php")
time.sleep(3)
username = browser.find_element_by_id("email")
password = browser.find_element_by_id("senha")

username.send_keys("MEU-USUARIO")
password.send_keys("MINHA-SENHA")

time.sleep(2)
login_attempt = browser.find_element_by_id('entrar').click()
time.sleep(5)

url = 'https://sistema.justwebtelecom.com.br/adm.php'

r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
lista = soup.find_all('html')

print(lista)

BR: However, by printing the list variable, I get the page’s source code https://sistema.justwebtelecom.com.br/login.php, that is, before logging in. I am the one who asks for the page print after logging in and I have access to the .../Adm.php panel .

BR: I wonder if there is any way I can get this information, because when I go to the web browser, I can get access to some file information with POST method. But I can’t print that information.

1 answer

Browser other questions tagged python web-scraping python-requests

You are not signed in. Login or sign up in order to post.

by Jefferson Matheus Duarte • **168** points · Answer 1 · 2020-04-14T04:51:06+00:00

Hello, first welcome home.

I noticed some errors in your code and other things I would do differently.

The overriding error in your code is that you are logging in to Selenium in an automated manner and right after you make an isolated request trying to access a page that requires a session. request will not take advantage of the session you opened with Lenium.

solution:

from selenium import webdriver
from bs4 import BeautifulSoup

import time
import requests

browser = webdriver.Firefox()
browser.get("https://sistema.justwebtelecom.com.br/login.php")
time.sleep(3)
username = browser.find_element_by_id("email")
password = browser.find_element_by_id("senha")

username.send_keys("MEU-USUARIO")
password.send_keys("MINHA-SENHA")

time.sleep(2)
login_attempt = browser.find_element_by_id('entrar').click()
time.sleep(5)

html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
lista = soup.find_all('html')

print(lista)

With this change you stop getting the source code from an external request and get the source code from Lenium itself. Since I don’t have the credentials to test this solution I ask that you test.

And finally, I don’t know if this code is just for testing but if I’m not advisable to rewrite the code using functions and separating the skills on the side because you want to create more extensive code later to maintain somewhere with the code being programmed from way this can get difficult.