Deploy queues to manage competition between Piders in Scrapyd

Question

Deploy queues to manage competition between Piders in Scrapyd

Asked 10 years, 4 months ago

Viewed 138 times

2

Is there any way that Scrapyd can create Piders queues so that when I send many Piders (with different functions) I can privilege/limit the competition between them? Today, all the Spiders I send execute according to the order stipulated by the Scrapyd server.

1 answer

Browser other questions tagged python web-application web-crawler scrapy

You are not signed in. Login or sign up in order to post.

by elias • **3,132** points · Answer 1 · 2015-01-11T04:28:20+00:00

Well, if you need simple priorities, one option is to use scrapyd’s priority parameter (this is not documented but is implemented here, is basically a basic priority queue on top of Sqlite).

To use, just pass the argument priority=NUMERO when calling the API /schedule.json. The default value is 0, use higher value for higher priority.

If you need some more complex queuing scheme, you may need to implement some solution of your own. Or use Scrapy Cloud of Scrapinghub [*], and structure the crawling using the queues of the Hub Crawl Frontier.

[*] for complete transparency: work at Scrapinghub