Well, if you need simple priorities, one option is to use scrapyd’s priority parameter (this is not documented but is implemented here, is basically a basic priority queue on top of Sqlite).
To use, just pass the argument priority=NUMERO
when calling the API /schedule.json
. The default value is 0
, use higher value for higher priority.
If you need some more complex queuing scheme, you may need to implement some solution of your own. Or use Scrapy Cloud of Scrapinghub [*], and structure the crawling using the queues of the Hub Crawl Frontier.
[*] for complete transparency: work at Scrapinghub