Multiple Pipelines to Treat Different Files in Scrapy

Asked

Viewed 152 times

3

How to treat pipelines.py when we have different Piders?

Example: I have a Spider that works by getting posts from a particular blog and another by saving images from jpeg banners found on each page. The two Spiders work, but I use the same pipeline to persist the objects.

1 answer

2


It is a common pattern in pipelines (and Spider middlewares as well) to use Spider attributes to decide what to do:

class MyPipeline:
    def process_item(self, item, spider):
        if getattr(spider, 'my_pipeline_enabled', False):
            # faz a coisa aqui

This way, although the pipeline is enabled in the entire project, you can use the attribute my_pipeline_enabled to enable the pipeline only for the Spiders you want.

You can also expand this code to consider a configuration if needed.

In Scrapy 0.25+ (not yet released, so far only taking Git Change), you also have the alternative of using settings in Pider that take precedence over those in the project.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.