How to synchronize script execution with asynchronous batch operation?

Asked

Viewed 66 times

1

To illustrate the problem let’s go to a real scenario: I have to access something close to 450 directories shared in network to get the metadata of a given file, this data will be later compared with those of the original file.

In the above scenario I search the information of size and date/time (timestamp) of the last modification, to obtain this data is simple, basically I call this module:

const fs = require('fs')

const metadata = (path) => {
  return new Promise((resolve, reject) => {
    fs.stat(path, (err, stats) => {
      if (err) reject(err)

      resolve(stats)
    }
  })
}

module.exports = metadata

The point here is that I need this operation to be performed in parallel (at this point I’m referring to background execution via thread pool), that is, I can’t block the flow with a await, so much so that I’m even using set UV_THREADPOOL_SIZE=10 to increase the amount of thread...

I have tried numerous solutions, in one of them I call the module presented above and store the Promise returned in an array, just ahead a loop for traverse the array searching for each Promise and calling one then, internally this instruction performs the comparison with the source, stores the result whether it is equal or not in an array and uses a if to determine if this was the last Promise of the Array, if yes print on the console the final result.

The problem is that I did not find a way to execute these queries in parallel, obtaining only the consolidated result. Roughly speaking, I was unable to synchronize asynchronous operations with synchronous sections in order to take advantage of both. How can I query the data in parallel and then synchronously execute data instructions?

  • 2

    Ever tried to use Promise.all? By itself, Javascript is not a language that allows parallel execution. Learn more about this here. In Node.js, you can use worker_threads, to create other running threads.

  • Yes, I’m aware of that. I will modify the question to clarify this point, in fact I am working with a higher thread volume than the standard to meet the amount of queries, especially considering that some of them may get stuck until a timeout of the call occurs. In the case Promise.all does not fully meet me since there is no guarantee that all Promises will be resolved.

  • 2

    If the environment supports, you can use Promise.allSettled.

  • I know, and I understand, that your goal is to get a closed Javascript solution, but if you don’t get a solution you have already considered a Python server assisting Node.js by scheduling and running these tasks in parallel.

  • I am closed in Javascript more by necessity, unfortunately I was not given the possibility to use other technology. But personally I would definitely consider it. Now I’m looking at the Worker API, see if I can get a light.

1 answer

0

I was able to reach a satisfactory solution using Promise.allSettled (suggestion from @Luiz Felipe).

Abstraction of the solution:

// ...
await Promise.allSettled(promises).then((results) => {
// ... código aqui
})

This code basically solves my life. It allows me to perform the 450 queries using the Thread Pool, the value promises past to Promise.allSettled is an array of files. await already allows me to lock the flow until all files have been resolved by allSettled.

Normally I trigger this process used set UV_THREADPOOL_SIZE=10 && node index.js, this way I have more thread available.

Note: I will also test using the Worker API.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.