Extract pdf documents from scrapy sites

Question

Extract pdf documents from scrapy sites

Asked 6 years, 11 months ago

Viewed 80 times

0

It is possible to scan an entire site by going through all links in search of scrapy pdf files? would be something like apache nutch. I did a search but the staff only uses Xpath, and Xpath can not for min pq I have to enter in several sites to do the research and make a Crawler for each site is humanly impossible.

Obs:

I have to download the pdf(s);

I have to pass several url(s) to Crawler.

What language do you work with?

– André Lins

2019/04/24 at 16:12
Good afternoon, André. I work with php but I think with scrapy would be faster since it is an application to do Crawler.

– mell system

2019/04/24 at 16:18

No answers

Browser other questions tagged web-scraping scrapy

You are not signed in. Login or sign up in order to post.