0
I am beginner in web scraping. How to learn making a database from data on selling semi-new cars on some websites. One of the sites is this
url = https://www.seminovosunidas.com.br/veiculos/page:1?utm_source=afilio&utm_medium=display&utm_campaign=maio&utm_content=ron_ambos&utm_term=120x600_promocaomaio_performance_-_-%22
I can get the data I need from the page as normal. To iterate use a url.format passing as argument an Indice that increases the page.
The complete code:
import requests as req
from bs4 import BeautifulSoup as bs
def get_unidas():
url = "https://www.seminovosunidas.com.br/veiculos/page:{}?utm_source=afilio&utm_medium=display&utm_campaign=maio&utm_content=ron_ambos&utm_term=120x600_promocaomaio_performance_-_-"
indice_pagina = 1
dados = {}
while True:
#headers = {'User-Agent':random.choice(user_agent_list)}
r = req.get(url.format(indice_pagina))
if r.status_code != req.codes.ok:
raise Exception("Página inexistente")
soup = bs(r.text, "lxml")
carros = soup.find_all(class_="vehicleDescription")
valores = soup.find_all(class_="valor")
for carro, valor in zip(carros,valores):
texto = list(carro.stripped_strings)
dados["Empresa"] = "Unidas"
dados["Modelo"] = texto[2]
dados["Preco"] = valor.text.replace(".","").replace(",",".")
dados["Kilometragem"] = texto[4].split(",")[1][5:]
dados["Ano"] = texto[3][-5:-1]
#print(dados)
#print("#######################################")
get_unidas()
The problem is I know how not to do for the while to end. When accessing a page with nonexistent Index, Dice 200 for example, it goes back to page 1. Normally a non-existent page has different HTML, so I can differentiate it from a page that exists. Even checking the status_code when accessing a non-existent page is returned 200, the code that indicates existing page
Thanks a lot, man. It worked great
– Rafael Ribeiro