How to make web scraping with Selenium on all pages of a website?

Asked

Viewed 61 times

-3

I’d like to do web scraping with Selenium on every Free Market offering page, but so far I’ve only been able to do it on the first one. I use Pandas to store the data in a dataframe. How do I do it on all pages (or more if it gets too heavy)? So far, I’ve done this:

from selenium import webdriver 
import pandas as pd 

driver = webdriver.Chrome(executable_path=r"C:/Users/Usuario/.spyder-py3/chromedriver.exe")

driver.get("https://www.mercadolivre.com.br/ofertas")
driver.implicitly_wait(3)

tituloProduto = driver.find_elements_by_class_name('promotion-item__title')
precoProduto = driver.find_elements_by_class_name('promotion-item__price')
df = pd.DataFrame()

produtos = []

for x in tituloProduto:
    produtos.append(x.text)
    
preco = []

for x in price:
    preco.append(x.text)
    
df['produto'] = produtos
df['preco'] = preco

df.head()    
                    produto                          preco

Furadeira Parafusadeira Com Impacto 20v 2 Bate...  R$ 34232

Sony Playstation 4 Slim 1tb Mega Pack: Ghost O...  R$ 2.549

Tablet Galaxy A7 Lite T225 4g Ram 64gb Grafite...  R$ 1.199

Smart Tv Philco Ptv55q20snbl Dled 4k 55 110v/220v  R$ 2.799

Nintendo Switch 32gb Standard Cor Vermelho-néo...  R$ 2.349
  • Please edit the question to limit it to a specific problem with sufficient detail to identify an appropriate answer.

  • 1

    Two things make me upset to see people scrape data like Selenium and scrape data on websites where the administrator provides a API for consultation.

  • @Augustovasques It’s good to have more respect for the participants, boy. I’ve never been treated like this in international forums and in the countries I’ve worked in. This is work for financial mathematics and programming practice at the public school I work at. If you can’t help, please don’t get in the way. Certainly many students will be discouraged from reading comments like yours.

  • @Augustovasques And it’s not your site. It’s mutual help for the Python community and others. I am already a doctor and I give free classes in a community of difficult access to help people in Enem, Portuguese, English and helping exact teachers with programming and I am terrified when I learn that there are people like you - trying to discourage others - in stack overflow.

  • 1

    Too bad it was never treated like this, because if it had been you would know that the worst option to do data scraping is to do it using Selenium. As for the API link if it was me who had won the link I would look at it as a gift, but if you’re offended, I’m not going to dissuade you. Now as to stop telling the truth because you don’t understand the message, I’m sorry I won’t. I know how to greatly optimize what you’re trying to do and I was willing to teach you, because it really makes me sad to see people do something simple in the most complicated way.

  • 1

    And it’s not just you who does it, it’s a widespread practice and taken for granted on the Internet. For me it costs nothing to chat a little, collect information about your goals, orient and leaves link to a sandbox with a very efficient code.

  • 1

    I don’t teach so I may be biased/mistaken, but generally speaking: from what I see here on the site, many people are not really learning to program, because many courses have focused more on "fashions" than on fundamentals. For example, they still need to teach how to use the most appropriate tool for each task - and instead prefer to teach something that "works", even if it is not the most appropriate solution (I say in general, without referring to your specific case, since I do not know your context)

  • 1

    Speaking of the specific case, if the goal is to collect data, so using the API (as already indicated above) seems to me the most appropriate. If the goal is to teach you how to manipulate HTML (or if you can’t use the API for some reason), I would go first to Beautiful Soup, a much more suitable lib for this case (for a number of technical reasons that do not fit in this space). And would only use Selenium as a last option.

  • 1

    Finally, some relevant links that may help: that and that. And while I’m at it, saying you’re not doing it the right way is a way to help - even if you don’t think :-)

  • 1

    @hkotsubo, making a late addendum. Neither in the latter case should Selenium be used, for two reasons: 1 According to the manufacturer himself Selenium is a project that covers a variety of tools and libraries that enable and support the automation of web browsers. It provides extensions to emulate user interaction with browsers. 2 Exists in Python module scrapy specializing in scanning websites and extracting structured data from its pages whose performance gain over Selenium is brutal.

Show 5 more comments
No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.