0
I have a scrapy with 2 Spiders that when running from the command line works perfectly without errors.
However, when I call through the application occurs the following:
2021-02-09 12:31:39 [twisted] CRITICAL: Unhandled Error
Traceback (most recent call last):
File "/home/rpinheiro/.virtualenvs/app_env/lib/python3.8/site-packages/twisted/internet/base.py", line 434, in fireEvent
DeferredList(beforeResults).addCallback(self._continueFiring)
File "/home/rpinheiro/.virtualenvs/app_env/lib/python3.8/site-packages/twisted/internet/defer.py", line 321, in addCallback
return self.addCallbacks(callback, callbackArgs=args,
File "/home/rpinheiro/.virtualenvs/app_env/lib/python3.8/site-packages/twisted/internet/defer.py", line 311, in addCallbacks
self._runCallbacks()
File "/home/rpinheiro/.virtualenvs/app_env/lib/python3.8/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
--- <exception caught here> ---
File "/home/rpinheiro/.virtualenvs/app_env/lib/python3.8/site-packages/twisted/internet/base.py", line 447, in _continueFiring
callable(*args, **kwargs)
File "/home/rpinheiro/.virtualenvs/app_env/lib/python3.8/site-packages/twisted/internet/base.py", line 1278, in _reallyStartRunning
self._handleSignals()
File "/home/rpinheiro/.virtualenvs/app_env/lib/python3.8/site-packages/twisted/internet/posixbase.py", line 295, in _handleSignals
_SignalReactorMixin._handleSignals(self)
File "/home/rpinheiro/.virtualenvs/app_env/lib/python3.8/site-packages/twisted/internet/base.py", line 1243, in _handleSignals
signal.signal(signal.SIGINT, self.sigInt)
File "/usr/lib/python3.8/signal.py", line 47, in signal
handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
builtins.ValueError: signal only works in main thread
Excerpt from the relevant code:
from django.core.management.base import BaseCommand
from nor.crawling.crawling.spiders.myspider1 import MySpider1
from nor.crawling.crawling.spiders.myspider2 import MySpider2
from scrapy.utils.project import get_project_settings
from scrapy.utils.log import configure_logging
from twisted.internet import reactor, defer
from scrapy.crawler import CrawlerRunner
class Command(BaseCommand):
help = "Release the spiders"
def handle(self, *args, **options):
configure_logging()
runner = CrawlerRunner(get_project_settings())
@defer.inlineCallbacks
def crawl():
yield runner.crawl(MySpider1)
yield runner.crawl(MySpider2)
reactor.stop()
crawl()
reactor.run()
This functionality that I am developing needs to be activated through a button click on the system.
I could solve this by scheduling the run through crontab but I believe that this is not the best.
I understand that this execution should start from the main thread but I’m not finding a way out.