Read contents of a folder and process files asynchronously

Asked

Viewed 443 times

1

I have a briefcase with several XML.

I would like a script to read these files, and process each of them.


How do I make to read the contents of the folder and "pull the files", since are different names ?

You could do that asynchronous form, thus, processing more than 1 at the same time, and setting a limit of simultaneous processes, avoiding locking the processing ?

1 answer

2

Utilize glob.

$array = glob('caminho/ate/a/pasta/*.xml');

You can set some folder patterns and use the asterisk to set wildcards, where any character, or amount of them, will be returned.

That is, the above code will return any file, within that path, that has the extension .xml.

Another idea is to use the GlobIterator of the SPL:

$iterator = new GlobIterator('caminho/ate/a/pasta/*.xml');

foreach ($iterator as $item) {
    echo $item;
}

Or some other iterator below:

All will arrive at the result you want, but specific to search for files is the glob or GlobIterator.

Parallel Processing

For parallel processing, there are three ways I know:

PHP Thread

In a very simplistic way, you should extend the Thread class and define the file processing:

class XmlProcessThread extends Thread {

    protected $filename;

    public function __construct($filename) { 
        $this->filename = $filename;
    }

    public function run() {
        /** utiliza o filename e reealiza o processamento **/
    }
}

When calling the processing, you must instantiate each Thread starting its execution:

foreach ($fileList as $filename) {
    $thread = new XmlProcessThread($filename);
    $thread->start();
    $threadList[] = $thread;
}

Script via Exec

Basically, you can run PHP files using the command exec. In the script call (command line), you must add the symbol as the last parameter &, this causes the script to run in the background and the PHP script (the one that started the execution) does not wait for the complete execution.

exec('php diretorio/thread.php filename.xml &');

In the archive thread.php, should use the variable $argv. It will contain all parameters sent to the script (in this case, filename.xml).

More information: /a/56598/5007

Distributed threads

In this method, you must create sockets, where each socket will process a file, and these sockets must be executed via Thread (as per the first example).

Readings:

Which is better?

Well, it always depends. They usually tend to have different results for different scenarios.

For example, distributed parallel processing tends to be faster in cases that require a lot of processing (hours of processing). Yeah, it distributes the processing to other servers. However, in this case, it may require sending the file to the other server (if the file is not within its reach) and be costly for final processing.

On the other hand, Thread and exec use the same server and processes will compete with each other, which will likely sacrifice performance.

These are just some of the examples about advantages and disadvantages. A good view you can take from the answer below:

It’s always guaranteed that a multi-threaded application runs faster than using a single thread?

  • Very good Gabriel, did not know either one. But what if it is asynchronous? for example: each file would process separately, with a limit of 5 simultaneous processes

  • @RBZ So, I saw now that you updated your response. There is a very good article on asynchronous and distributed processing, with specific example of XML. I am researching it here.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.