0
When executing the code below, find:
HTTP Error 429: Too Many Requests the server must have a time limit between the requests.
#Imports necessários do bs4
import bs4
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
import time
#Selecionar o site
url = 'https://hipsters.jobs/jobs/?l=Brasilia%20-%20Federal%20District%2C%20Brazil&p=1'
soup = BeautifulSoup(urlopen(url),"html.parser")
time.sleep(3)
print("\n****************************\n")
#Quantidade de vagas disponíveis
quantidade = soup.find("h1", class_='search-results__title col-sm-offset-3 col-xs-offset-0').get_text().strip()
print(quantidade)
quantidade = quantidade.split()
quantidade = ''.join(quantidade[0:1])
quantidade = int(quantidade)
#quantidade de vagas capturadas
vagas = 0
links = []
for item in soup.select(".listing-item__title"):
link = item.a.get('href')
links.append(link)
vagas += 1
print(vagas)
#Verificar se há vagas escondidas e pegá-las
if quantidade != vagas:
for i in range(2,50):
url = 'https://hipsters.jobs/jobs/?l=Brasilia%20-%20Federal%20District%2C%20Brazil&p={}'.format(str(i))
print(url)
time.sleep(10)
soup = BeautifulSoup(urlopen(url),"html.parser")
for item in soup.select(".listing-item__title"):
for l in links:
link = item.a.get('href')
if link != l:
links.append(link)
vagas += 1
if vagas == quantidade:
break
break
break
titulos = []
tags = []
salarios = []
datas = []
empresas = []
locais = []
descricoes = []
for i in links:
time.sleep(10)
url = 'https://' + i
soup = BeautifulSoup(urlopen(url),"html.parser")
titulos.append(soup.select("details-header__title").get_text().strip())
tags.append(soup.select("job-type").get_text().strip())
Explain well the problem and the difficulties you encounter, try to filter in a way that is sufficient to reproduce the problem: https://answall.com/help/minimal-reproducible-example. Welcome
– Miguel
Gave
HTTP Error 429: Too Many Requests
the server must have a time limit between requests. Reference 429 Too Many Requests– Augusto Vasques
Gustavo, the server you are scraping limits the amount of requests a client can make to avoid server abuse or overload. To avoid this you just have to respect these rules and make requests in a time interval that the server accepts.
– fernandosavio
Keep increasing the time in
time.sleep(...)
, trytime.sleep(10)
, try not totime.sleep(20)
, and so on. But overall this is probably a practice to try to avoid attacks, collect data and/or avoid slowness on their side, I saw that they have no API, so I’m not sure if their code practices something that can be considered illegal, I have no idea, which can be a problem (I think)– Guilherme Nascimento