Utilize glob
.
$array = glob('caminho/ate/a/pasta/*.xml');
You can set some folder patterns and use the asterisk to set wildcards, where any character, or amount of them, will be returned.
That is, the above code will return any file, within that path, that has the extension .xml
.
Another idea is to use the GlobIterator
of the SPL:
$iterator = new GlobIterator('caminho/ate/a/pasta/*.xml');
foreach ($iterator as $item) {
echo $item;
}
Or some other iterator below:
All will arrive at the result you want, but specific to search for files is the glob
or GlobIterator
.
Parallel Processing
For parallel processing, there are three ways I know:
PHP Thread
In a very simplistic way, you should extend the Thread class and define the file processing:
class XmlProcessThread extends Thread {
protected $filename;
public function __construct($filename) {
$this->filename = $filename;
}
public function run() {
/** utiliza o filename e reealiza o processamento **/
}
}
When calling the processing, you must instantiate each Thread starting its execution:
foreach ($fileList as $filename) {
$thread = new XmlProcessThread($filename);
$thread->start();
$threadList[] = $thread;
}
Script via Exec
Basically, you can run PHP files using the command exec
. In the script call (command line), you must add the symbol as the last parameter &
, this causes the script to run in the background and the PHP script (the one that started the execution) does not wait for the complete execution.
exec('php diretorio/thread.php filename.xml &');
In the archive thread.php
, should use the variable $argv
. It will contain all parameters sent to the script (in this case, filename.xml
).
More information: /a/56598/5007
Distributed threads
In this method, you must create sockets, where each socket will process a file, and these sockets must be executed via Thread (as per the first example).
Readings:
Which is better?
Well, it always depends. They usually tend to have different results for different scenarios.
For example, distributed parallel processing tends to be faster in cases that require a lot of processing (hours of processing). Yeah, it distributes the processing to other servers. However, in this case, it may require sending the file to the other server (if the file is not within its reach) and be costly for final processing.
On the other hand, Thread and exec use the same server and processes will compete with each other, which will likely sacrifice performance.
These are just some of the examples about advantages and disadvantages. A good view you can take from the answer below:
It’s always guaranteed that a multi-threaded application runs faster than using a single thread?
Very good Gabriel, did not know either one. But what if it is asynchronous? for example: each file would process separately, with a limit of 5 simultaneous processes
– rbz
@RBZ So, I saw now that you updated your response. There is a very good article on asynchronous and distributed processing, with specific example of XML. I am researching it here.
– Gabriel Heming