How to run a scrapy + Django on Runtime without getting error?

Asked

Viewed 19 times

0

I have a scrapy with 2 Spiders that when running from the command line works perfectly without errors.

However, when I call through the application occurs the following:

2021-02-09 12:31:39 [twisted] CRITICAL: Unhandled Error
Traceback (most recent call last):
  File "/home/rpinheiro/.virtualenvs/app_env/lib/python3.8/site-packages/twisted/internet/base.py", line 434, in fireEvent
    DeferredList(beforeResults).addCallback(self._continueFiring)
  File "/home/rpinheiro/.virtualenvs/app_env/lib/python3.8/site-packages/twisted/internet/defer.py", line 321, in addCallback
    return self.addCallbacks(callback, callbackArgs=args,
  File "/home/rpinheiro/.virtualenvs/app_env/lib/python3.8/site-packages/twisted/internet/defer.py", line 311, in addCallbacks
    self._runCallbacks()
  File "/home/rpinheiro/.virtualenvs/app_env/lib/python3.8/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
--- <exception caught here> ---
  File "/home/rpinheiro/.virtualenvs/app_env/lib/python3.8/site-packages/twisted/internet/base.py", line 447, in _continueFiring
    callable(*args, **kwargs)
  File "/home/rpinheiro/.virtualenvs/app_env/lib/python3.8/site-packages/twisted/internet/base.py", line 1278, in _reallyStartRunning
    self._handleSignals()
  File "/home/rpinheiro/.virtualenvs/app_env/lib/python3.8/site-packages/twisted/internet/posixbase.py", line 295, in _handleSignals
    _SignalReactorMixin._handleSignals(self)
  File "/home/rpinheiro/.virtualenvs/app_env/lib/python3.8/site-packages/twisted/internet/base.py", line 1243, in _handleSignals
    signal.signal(signal.SIGINT, self.sigInt)
  File "/usr/lib/python3.8/signal.py", line 47, in signal
    handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
builtins.ValueError: signal only works in main thread

Excerpt from the relevant code:

from django.core.management.base import BaseCommand
from nor.crawling.crawling.spiders.myspider1 import MySpider1
from nor.crawling.crawling.spiders.myspider2 import MySpider2
from scrapy.utils.project import get_project_settings
from scrapy.utils.log import configure_logging
from twisted.internet import reactor, defer
from scrapy.crawler import CrawlerRunner


class Command(BaseCommand):
    help = "Release the spiders"

    def handle(self, *args, **options):
        configure_logging()
        runner = CrawlerRunner(get_project_settings())

        @defer.inlineCallbacks
        def crawl():
            yield runner.crawl(MySpider1)
            yield runner.crawl(MySpider2)
            reactor.stop()

        crawl()

        reactor.run()

This functionality that I am developing needs to be activated through a button click on the system.

I could solve this by scheduling the run through crontab but I believe that this is not the best.

I understand that this execution should start from the main thread but I’m not finding a way out.

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.