Code improvement

Asked

Viewed 169 times

0

Hello. I have this code:

import os
from selenium import webdriver  
import time



def visita(link):
    try:    
        options = webdriver.ChromeOptions()
        options.add_argument('--headless')
        driver = webdriver.Chrome(options=options)
        driver.get(link)
        driver.execute_script('window.scrollTo(0,document.body.scrollHeight);')
        time.sleep(5)
    except Exception as e:
        print(e)
    driver.close()


links = ['https://link.com/post/6074/','https://link.com/post/6288/']
while True:
        for link in links:
                visita(link)

Code summary:
1- Open links using chromedriver
2- Run a javascript on the page to scroll to the bottom of the page, wait 5 seconds and close with driver.close()

Problem: Code execution is quite slow.

Is there any other module/library that can do the same more effectively ?

  • Try to improve the question, something like: how can I let my interface test run faster?

2 answers

0

Selenium is already slow by nature because it needs to upload its application, some cases with database, open Chrome, and browse, you even added for it to stay 5 seconds waiting that will slow this whole process 5 seconds.

To optimize you can use Phantomjs instead of Selenium, Phantomjs emulates a browser, and review the need to wait 5 seconds to close the driver, probably less than a second should be enough for you.

0


If you are scraping on sites that do not require user authentication, that is, you do not need to be logged in to have access to these links, you can simply use the Beautiful Soup, saves you a lot of running time than using Selenium.

Now as mentioned by Roberto, because it is Selenium, the existence of a physical browser open on the machine is already consuming a lot of time, what Voce can do as he said and emulate a browser without the need to open a webdriver. You can do this both with the Mechanize for python2 as with the Mechanicalsoup for python3.

Another tip I give, and because it is multiple links, you can use threads and make several scrapings at once, speeding up the whole process. Of course the page loading time still remains the same thing, or if not worse depending on your internet link.

  • What about the question of scrolling to the bottom of the page ? would be possible with any of these in which you quoted ?

  • Does the page have that so-called Infinite scroll (when you arrive at the end of the page, it loads new content)? Or are all page items loaded at once? if you are an Infinite scroll, from an Inspect in the page code and go to the network tab, usually the Infinite scroll and a request in GET that returns a JSON like this one from Tecmundo ( https://api.tecmundo.com.br/api/v2/news/latest-news/?&Older=True&page=3 ) if you change the value of the page it shows new content.

  • Next, I’m making a visiting bot for these links. They are different links for each user, IE, if I visit a link of mine and give scroll until the end of the page this will appear as a visit, after scroll, the page will load content of other users. So I have to visit my links, scroll to the bottom of the page, wait 5 seconds and then close the page to repeat the process.

  • So I recommend you use Selenium anyway, I understand you have a list of links, and you just want to access them and scroll to the end of the page and close after 5 seconds, all of this in an automated way. I understand, what you can do and split your task into a 2-4 thread to streamline your process.

  • If possible, could you bring up an example of how to run my code in 3 threads? Thank you.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.