2
I’ve grouped 2 questions because I think they’re related.
I made a script test, where saved in the database the links saved with your data.
This is a bad practice? (High priority)
Do I need to accomplish something else so I don’t import duplicates? In my pipeline has a simple check looking for link=%s
, would it be better if I use md5 (link)? Faster query?
I can use the -s JOBDIR=crawls/somespider-1
to pause and return Crawler, but would like to know how to accomplish this by list of links to be processed in Mysql. (Low priority)
I need to add new items to my list of start_urls
, or dynamically.
I must create Request
with callback parse_category
? Is there any way I can add self.queue
or self.start_url
and add new url’s to be processed? (High priority)
Luiz, welcome to Sopt. Access Help and take the Tour, to better understand how to use the resources here.
– Leo