HTTP Error 429: Too Many Requests in Web scraping in repl

Question

HTTP Error 429: Too Many Requests in Web scraping in repl

Asked 5 years, 11 months ago

Viewed 183 times

0

When executing the code below, find:

HTTP Error 429: Too Many Requests the server must have a time limit between the requests.

#Imports necessários do bs4  
import bs4  
from urllib.request import urlopen  
from bs4 import BeautifulSoup  
import requests  
import time  

#Selecionar o site  
url = 'https://hipsters.jobs/jobs/?l=Brasilia%20-%20Federal%20District%2C%20Brazil&p=1'  
soup = BeautifulSoup(urlopen(url),"html.parser")  
time.sleep(3)  
print("\n****************************\n")  


#Quantidade de vagas disponíveis  
quantidade = soup.find("h1", class_='search-results__title col-sm-offset-3 col-xs-offset-0').get_text().strip()  
print(quantidade)  
quantidade = quantidade.split()  
quantidade = ''.join(quantidade[0:1])  
quantidade = int(quantidade)  

#quantidade de vagas capturadas  
vagas = 0  
links = []  
for item in soup.select(".listing-item__title"):  
  link = item.a.get('href')  
  links.append(link)  
  vagas += 1  
print(vagas)  

#Verificar se há vagas escondidas e pegá-las  
if quantidade != vagas:  
  for i in range(2,50):  
    url = 'https://hipsters.jobs/jobs/?l=Brasilia%20-%20Federal%20District%2C%20Brazil&p={}'.format(str(i))  
    print(url)  
    time.sleep(10)  
    soup = BeautifulSoup(urlopen(url),"html.parser")  
    for item in soup.select(".listing-item__title"):  
      for l in links:  
        link = item.a.get('href')  
        if link != l:  
          links.append(link)  
          vagas += 1  
          if vagas == quantidade:  
            break  
      break  
    break  

titulos = []  
tags = []  
salarios = []  
datas = []  
empresas = []  
locais = []  
descricoes = []  

for i in links:  
  time.sleep(10)  
  url = 'https://' + i  
  soup = BeautifulSoup(urlopen(url),"html.parser")  
  titulos.append(soup.select("details-header__title").get_text().strip())  
  tags.append(soup.select("job-type").get_text().strip())

2

Explain well the problem and the difficulties you encounter, try to filter in a way that is sufficient to reproduce the problem: https://answall.com/help/minimal-reproducible-example. Welcome

– Miguel

2019/11/18 at 18:10
Gave HTTP Error 429: Too Many Requests the server must have a time limit between requests. Reference 429 Too Many Requests

– Augusto Vasques

2019/11/18 at 18:30
1

Gustavo, the server you are scraping limits the amount of requests a client can make to avoid server abuse or overload. To avoid this you just have to respect these rules and make requests in a time interval that the server accepts.

– fernandosavio

2019/11/18 at 19:20
2

Keep increasing the time in time.sleep(...), try time.sleep(10), try not to time.sleep(20), and so on. But overall this is probably a practice to try to avoid attacks, collect data and/or avoid slowness on their side, I saw that they have no API, so I’m not sure if their code practices something that can be considered illegal, I have no idea, which can be a problem (I think)

– Guilherme Nascimento

2019/11/18 at 19:34

No answers

Browser other questions tagged python web-scraping beautifulsoup

You are not signed in. Login or sign up in order to post.