Generally, it doesn’t pay to parallelize at more than one level. This is until it is possible but will not make your code run faster, unless the first level of parallelism is failing to utilize the entire idle feature of the computer.
Nowadays the easiest way to create parallel code in R is by using the package future
in combination with the furrr
.
See here a classic example of parallelization:
library(furrr)
#> Loading required package: future
library(purrr)
plan(multisession)
fun <- function(x) {
Sys.sleep(1)
x
}
system.time(
map(1:4, fun)
)
#> user system elapsed
#> 0.004 0.001 4.020
system.time(
future_map(1:4, fun)
)
#> user system elapsed
#> 0.077 0.012 1.297
Created on 2019-02-13 by the reprex package (v0.2.1)
In the example, the parallel version takes a little more than 1s while the sequential version takes 4s, as expected.
Now let’s add a second level of parallelization.
library(furrr)
#> Loading required package: future
library(purrr)
plan(multisession)
fun <- function(x) {
Sys.sleep(1)
x
}
system.time(
future_map(1:4, ~map(1:4, fun))
)
#> user system elapsed
#> 0.090 0.012 4.391
system.time(
future_map(1:4, ~future_map(1:4, fun))
)
#> user system elapsed
#> 0.065 0.005 4.223
Created on 2019-02-13 by the reprex package (v0.2.1)
See that the two forms take very similar times. This happens because first parallelization already uses all the idle CPU resource of the computer, the second level of parallelization can not gain more space.
The first level might not be using all the computer resources, if for example my computer had 8 colors instead of 4, parallelizing only on the first level would leave 4 underutilized colors. In this case it would make sense to do the parallelization on the second level. However, this is rare. In general we parallelize loops in which the number of iterations is > than the number of colors.
It seems the answer is in the operator
%:%
. Behold.– Tomás Barcellos
I just found out I have to study the library(doParallel) to make the library(foreach) work better.
– Márcio Mocellin
@Tomassbarcellos the importance of the operator is that with
%do%
this function is equal tofor
and with %dopar% is the version Paralelizada. However first you have to configure the parallelization with thedoParallel
.– Márcio Mocellin
Use
%dopar%
without first setting up withdoParallel
does not work. Before you run the code withforeach(...) %dopar%{...}
, you will need to rotatedoParallel::registerDoParallel()
. Then just close the cluster created withdoParallel::stopImplicitCluster()
.– JdeMello