Extract pdf documents from scrapy sites

Asked

Viewed 80 times

0

It is possible to scan an entire site by going through all links in search of scrapy pdf files? would be something like apache nutch. I did a search but the staff only uses Xpath, and Xpath can not for min pq I have to enter in several sites to do the research and make a Crawler for each site is humanly impossible.

Obs:

I have to download the pdf(s);

I have to pass several url(s) to Crawler.

  • What language do you work with?

  • Good afternoon, André. I work with php but I think with scrapy would be faster since it is an application to do Crawler.

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.