-3
I’d like to do web scraping with Selenium on every Free Market offering page, but so far I’ve only been able to do it on the first one. I use Pandas to store the data in a dataframe. How do I do it on all pages (or more if it gets too heavy)? So far, I’ve done this:
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome(executable_path=r"C:/Users/Usuario/.spyder-py3/chromedriver.exe")
driver.get("https://www.mercadolivre.com.br/ofertas")
driver.implicitly_wait(3)
tituloProduto = driver.find_elements_by_class_name('promotion-item__title')
precoProduto = driver.find_elements_by_class_name('promotion-item__price')
df = pd.DataFrame()
produtos = []
for x in tituloProduto:
produtos.append(x.text)
preco = []
for x in price:
preco.append(x.text)
df['produto'] = produtos
df['preco'] = preco
df.head()
produto preco
Furadeira Parafusadeira Com Impacto 20v 2 Bate... R$ 34232
Sony Playstation 4 Slim 1tb Mega Pack: Ghost O... R$ 2.549
Tablet Galaxy A7 Lite T225 4g Ram 64gb Grafite... R$ 1.199
Smart Tv Philco Ptv55q20snbl Dled 4k 55 110v/220v R$ 2.799
Nintendo Switch 32gb Standard Cor Vermelho-néo... R$ 2.349
Please edit the question to limit it to a specific problem with sufficient detail to identify an appropriate answer.
–
Two things make me upset to see people scrape data like Selenium and scrape data on websites where the administrator provides a API for consultation.
– Augusto Vasques
@Augustovasques It’s good to have more respect for the participants, boy. I’ve never been treated like this in international forums and in the countries I’ve worked in. This is work for financial mathematics and programming practice at the public school I work at. If you can’t help, please don’t get in the way. Certainly many students will be discouraged from reading comments like yours.
– Junior Costa
@Augustovasques And it’s not your site. It’s mutual help for the Python community and others. I am already a doctor and I give free classes in a community of difficult access to help people in Enem, Portuguese, English and helping exact teachers with programming and I am terrified when I learn that there are people like you - trying to discourage others - in stack overflow.
– Junior Costa
Too bad it was never treated like this, because if it had been you would know that the worst option to do data scraping is to do it using Selenium. As for the API link if it was me who had won the link I would look at it as a gift, but if you’re offended, I’m not going to dissuade you. Now as to stop telling the truth because you don’t understand the message, I’m sorry I won’t. I know how to greatly optimize what you’re trying to do and I was willing to teach you, because it really makes me sad to see people do something simple in the most complicated way.
– Augusto Vasques
And it’s not just you who does it, it’s a widespread practice and taken for granted on the Internet. For me it costs nothing to chat a little, collect information about your goals, orient and leaves link to a sandbox with a very efficient code.
– Augusto Vasques
I don’t teach so I may be biased/mistaken, but generally speaking: from what I see here on the site, many people are not really learning to program, because many courses have focused more on "fashions" than on fundamentals. For example, they still need to teach how to use the most appropriate tool for each task - and instead prefer to teach something that "works", even if it is not the most appropriate solution (I say in general, without referring to your specific case, since I do not know your context)
– hkotsubo
Speaking of the specific case, if the goal is to collect data, so using the API (as already indicated above) seems to me the most appropriate. If the goal is to teach you how to manipulate HTML (or if you can’t use the API for some reason), I would go first to Beautiful Soup, a much more suitable lib for this case (for a number of technical reasons that do not fit in this space). And would only use Selenium as a last option.
– hkotsubo
Finally, some relevant links that may help: that and that. And while I’m at it, saying you’re not doing it the right way is a way to help - even if you don’t think :-)
– hkotsubo
@hkotsubo, making a late addendum. Neither in the latter case should Selenium be used, for two reasons: 1 According to the manufacturer himself Selenium is a project that covers a variety of tools and libraries that enable and support the automation of web browsers. It provides extensions to emulate user interaction with browsers. 2 Exists in Python module scrapy specializing in scanning websites and extracting structured data from its pages whose performance gain over Selenium is brutal.
– Augusto Vasques