But this is basically what Crawler should do, it will be up to you to use a database with the list of sites you want to scan and a cron to schedule the scans, each one cron preferably to schedule the scans, in this script you would pass the argument of the site you want to scan, for example: $crawler->setURL($argv[1])
.
Don’t expect a single php request to process numerous websites, it will be bad for your server, Google, Yahoo, Bing periodically scan different sites and routines and they probably have a limit of scanning one site per hour and continue only after.
If only a request and a php script tried to access multiple urls, the application would be in a long process that could take hours and depending not on the Garbage Collection (GC) PHP would not be able to clean up the use which would cause the processor consumption or memory to increase until your server starts crashing.
The most appropriate way (not necessarily the right way) is to scan one site at a time and set a limit and try to pick up where you left off if you’re going to use the limit. Remember there are sites that may have more than 50,000 pages.
look thank you very much, I cleared up quite the idea of what I should do, thank you
– João Pacheco