How to protect my Scrapyd server from unauthorized calls?

Asked

Viewed 113 times

2

Let’s say I have the following configuration in scrapy.cfg in Scrapyd.

[deploy]
url = http://example.com/api/scrapyd/
username = user
password = secret
project = projectX

In the Scrapyd documentation he cites the username and password options, but apparently I keep having the Spiders run even without authentication.

The question is, how to protect my Scrapyd server from unwanted/unauthenticated calls?

2 answers

2


So this username/password setting is a client setting for a basic HTTP authentication, that the scrapyd currently does not implement.

To set this up on your server, the way is to let scrapyd listen only to local connections (127.0.0.1) and set up an Nginx (or other HTTP proxy) with HTTP authentication in front, forwarding the requests to scrapyd.

Here’s a ready-to-use container configuration Docker: https://github.com/mattes/scrapyd

If using Docker is not an option, you can draw inspiration from Dockerfile and in the Nginx.conf provided in the repository to do the process manually.

0

You can edit the scrapyd settings and put the following configuration in the ~/.scrapyd.conf file:

bind_address = 127.0.0.1

This will make the server only usable by processes running on itself.

If you want password protection, you can also use apache server as proxy and add a basic authentication. Abaxio follows an example of virtual host:

<VirtualHost *:80>
    ServerName yourserver
    DocumentRoot /var/www/service-status
    <Directory /var/www/service-status/>
        Require valid-user
        Order allow,deny
        Allow from all
        AuthType Basic
        AuthName "Protected"
        AuthUserFile /var/www/service-status/.htpasswd
    </Directory>
    <Location /api/>
        ProxyPass  http://127.0.0.1:40500/
        ProxyPassReverse  http://127.0.0.1:40500/        
    </Location>
    <Proxy *>
        Require valid-user
        AuthType Basic
        AuthName "Protected"
        AuthUserFile /var/www/service-status/.htpasswd
    </Proxy>    
    RewriteEngine  on
    RewriteRule ^/?api$         /api/ [QSA,L,R]
    RewriteRule ^/?jobs(.*)     /api/jobs$1 [QSA,L,R]
    RewriteRule ^/?logs(.*)     /api/logs$1 [QSA,L,R]
    RewriteRule ^/?items(.*)    /api/items$1 [QSA,L,R]
</VirtualHost>

In the above example, the API commands would be available in http://your domain/api/command.json

The . htpasswd file will need to be created for virtualserver to work.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.