Sites with authentication - Web Scraping - Python

Asked

Viewed 1,363 times

1

BR: I’m trying to automate a web data acquisition process using Python. In my case, I need to pull the information from the page https://sistema.justwebtelecom.com.br/adm.php. However, before going to this page, you need to log in to https://sistema.justwebtelecom.com.br/login.php. The code below should theoretically log into the site:

from selenium import webdriver
from bs4 import BeautifulSoup

import time
import requests

browser = webdriver.Firefox()
browser.get("https://sistema.justwebtelecom.com.br/login.php")
time.sleep(3)
username = browser.find_element_by_id("email")
password = browser.find_element_by_id("senha")

username.send_keys("MEU-USUARIO")
password.send_keys("MINHA-SENHA")

time.sleep(2)
login_attempt = browser.find_element_by_id('entrar').click()
time.sleep(5)

url = 'https://sistema.justwebtelecom.com.br/adm.php'

r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
lista = soup.find_all('html')

print(lista)

BR: However, by printing the list variable, I get the page’s source code https://sistema.justwebtelecom.com.br/login.php, that is, before logging in. I am the one who asks for the page print after logging in and I have access to the .../Adm.php panel .

BR: I wonder if there is any way I can get this information, because when I go to the web browser, I can get access to some file information with POST method. But I can’t print that information.

1 answer

2

Hello, first welcome home.

I noticed some errors in your code and other things I would do differently.

The overriding error in your code is that you are logging in to Selenium in an automated manner and right after you make an isolated request trying to access a page that requires a session. request will not take advantage of the session you opened with Lenium.

solution:

from selenium import webdriver
from bs4 import BeautifulSoup

import time
import requests

browser = webdriver.Firefox()
browser.get("https://sistema.justwebtelecom.com.br/login.php")
time.sleep(3)
username = browser.find_element_by_id("email")
password = browser.find_element_by_id("senha")

username.send_keys("MEU-USUARIO")
password.send_keys("MINHA-SENHA")

time.sleep(2)
login_attempt = browser.find_element_by_id('entrar').click()
time.sleep(5)

html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
lista = soup.find_all('html')

print(lista)

With this change you stop getting the source code from an external request and get the source code from Lenium itself. Since I don’t have the credentials to test this solution I ask that you test.

And finally, I don’t know if this code is just for testing but if I’m not advisable to rewrite the code using functions and separating the skills on the side because you want to create more extensive code later to maintain somewhere with the code being programmed from way this can get difficult.

  • Man, thank you very much. I gave it right, I managed to get the data of the page with your code above. Save my night, I will even sleep now kkk. Blz, I’ll take your advice. This code I made was only to try to solve this doubt, but I will change it for sure.

  • Any other questions we’re there to help. One more thing, the request is now a useless import can remove it from the code too, forgot to remove.

  • Blz. Thank you very much. Vlw

Browser other questions tagged

You are not signed in. Login or sign up in order to post.